BIOMETRIKA 


W. F. R. WELDON, FRANCIS GALTON and KARL PEARSON 

MANAGING EDITOR 

E. S. PEARSON 

ASSOCIATE EDITORS 

M. G. KENDALL JOHN WISHART 

in consultation with 

HARALD CRAMER J. B. S. HALDANE 

R. C. GEARY MAJOR GREENWOOD 


VOLUME XXXVI, 


1949 


ISSUED BY 

THE BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON 
PRINTED AT THE UNIVERSITY PRESS, CAMBRIDGE 




CONTENTS OF VOLUME XXXVI 

Memoirs 

PAGES 

I. The infectiousness of measles. By Major Greenwood . . 1—8 

II. A note on the analysis of grouped probit data. By K. D. Tocher 9—-17 

III. A generalization of Poisson’s binomial limit for use in ecology, By 

Marjorie Thomas.18—25 

IV. The estimation and comparison of residual regressions where there 

are two or more related sets of observations. By A. H. Garter 26—46 

V. Cumulants of multivariate multinomial distributions. By John 

Wishart.47—68 

VI. On the Wishart distribution in statistics. By A. C. Aitken . . 69—62 

VII. The spectral theory of discrete stochastic processes. By P. A. P. 

Moran.63—70 

VIII, On a property of distributions admitting sufficient statistics. By 

V. S. H'hztjrbazar.71—74 

IX. On a method of trend elimination. By M. H. Qoenouillr . . 75—91 

X. On the estimation of dispersion by linear systematic statistics. By 

H. J. Godwin ... .92—100 

XI. On the reconciliation of theories of probability. By M. G. Kendall 101—116 

XII. The derivation and partition of y 2 in certain discrete distributions. 

By H. 0. Lancaster.117—129 

XIII. A note on the subdivision of y 2 into components. By J. O. Irwin. 130—134 

XIV. The first and second moments of some probability distributions 

arising from points on a lattice and their application. By P. V. 

Krishna Iyer. With one Figure in the Text . . . . 135—141 

XV, Probability Tables for the range. By E. J. Gumbel. With two 

Figures in the Text.. . 142—148 

XVI. Systems of frequency curves generated by methods of translation. 

By N. L. Johnson. With twelve Figures in the Text . . 149—176 

XVII. Rank and product-moment correlation. By M. G. Kendall . 177—193 

XVIII. Tests of significance in harmonic analysis. By H. O. Hartley . 194—201 

XIX. The non-central y a and F-distributions and their applications. By 

P. B. Patnaik. With four Figures in the Text .... 202—232 

XX. The estimation of the parameters of tolerance distributions. By 

D. J. Finney.. . . . 239—256 

XXL An overlap problem arising in particle counting. By P. Armitage. 

With one Figure in the Text. 257—266 

XXII. Tables of autoregressive series. By M, G. Kendall . . . 267—289 

XXIII, Tables for use in comparisons whose accuracy involves two variances, ■ 
separately estimated. By Alice A. Asrin. With an Appendix by 
B. L. Welch . 290—296 







y j Contents 

PACJ NS 

XXIV. Bivariate distributions based on simple translation systems. By 

N. L. Johnson. With seventeen .Figures in the Text . . 21)7—,‘104 

XXV. A test for randomness in a sequence of two alternatives involving 

a 2 x 2 table. By P. G. Moore .UOn—JUG 

XXVI. A general distribution theory for a class of likelihood criteria. By 

G. E. P. Box..“} 17—,'14G 

XXVII. Note on approximations to the power function of the ‘2x2 com¬ 
parative trial’. By G. P. Sillltto. With one Figure in the '1'cxt 447—352 

XXVIII. The distribution of ‘ (Student’s ’ t in rai idem sam | lies of any size <1 raw a 
from non-normal universes. By A. K. Gayisn. With four Figures 
in the Text.BoB —30<J 

XXIX. The combination of probabilities arising from data in discrete dis¬ 
tributions. By H. 0. Lancaster. With one Figure in the Text (170— :iS2 

XXX. Xote on the application of Fisher’s /^-statistics. By F. .N, David 

XXXI. The moments of the z and F distributions. By F. N. David . B<>-1—-103 

XXXII. The method of frequency-moments and its application to type VII 

populations. By Herbert S. Sich be.-104—4-25 

XXXIII. On the use of Student’s /-test in an asymmetrical population. By 

S. G. Gjjurye .f2(i--4:J0 

XXXIV. Tables of symmetric functions. Parti. By 10 N. David and M.G. 

Kendall .■i:u- -441) 


Miscdlane a 

(i) On a method of estimating frequencies. 'By D. J, Finney , 

(ii) A further note on the mean deviation from the median. By Iv. R. N.\ i u. 
(hi) Review of Harold Jeffreys’ Theory of Probability. By F. X. Davi d 

(iv) Review of Karl Pearson’s Early Statistical Papers. .By Q. U. Yulic 

(v) On the efficiency of the method of moments and Neyman’s type A dis¬ 

tribution. By L. R. Shrnton. 

(vi) Large-sample theory of sequential estimation. By F. J. Ansoomuh 

(vii) A historical note on the method of least squares. By R. L. .Plaokett 

(viii) The characteristic function of a weighted sum of non-central squares of 
noTtaal variates subject to s linear restraints. By G. I. Bateman 

(ix) Intra-class rank correlation. By J, W. WnmuBLo 

(x) A note on non-normal correlation. By J. B. 8. Haldane 

(xi) Review of P. X. David’s Probability Theory for Statistical Methods. Bv 

C. A. B. Smith. 

(xii) Review of T. L. Kelley’s 27ie Fundamentals of Statistics. By F. N 

David. 


2;i:i—2.*w 
234—23li 
23(1 

2,‘Id—2,‘jH 

450-454 
455—458 
45 H—-Hit) 

4(10—4(12 

40J .1(17 

407 -108 

401) -470 

470 



Vol. XXXVI Parts I and II 


June 1949 



FOUNDED BY 

W. F. R. WELDON, FRANCIS GALTON and KARL PEARSON 

MANAGING EDITOE 

E. S. PEARSON 

ASSOCIATE EDITORS 

M. G. KENDALL JOHN WISHART 

in consultation with 

HARALD CRAMER MAJOR GREENWOOD 

R. C. GEARY J. B. S. HALDANE 

Reprinted by ojfset-litho, 1960 


ISSUED BY 

THE BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON 
PRINTED AT THE UNIVERSITY PRESS, CAMBRIDGE 


[Issued 22 July 1949] 




Volume XXXVI, Parts I and II 


June 1949 


THE INFECTIOUSNESS OF MEASLES 

By MAJOR GREENWOOD 

Measles, once a deadly, still a very common disease, is a worthy object of study for the 
biometrician. Although it is not now a very important cause of death or invalidity, it has been 
and may again become important. Then it is a virus disease, and virus diseases, for instance, 
influenza and poliomyelitis, are very important killing or maiming diseases; if we really knew 
precisely how measles spreads, that knowledge might help us to understand how these more 
serious illnesses are passed on. The statistical literature of measles is enormous, but, as 
I shall illustrate, it is easy to misinterpret unhomogeneous data, and really precise observa¬ 
tions of the natural history of measles are rare. 

Some opinions on the aetiology of measles are old and universally held by physicians. The 
first is that the length of time during which an infected person oan pass the virus to another 
person is short and is over when the patient shows the characteristic signs of the 
disease—the typical rash, etc. The second is that the interval between the moment of in¬ 
fection, viz. reception of an effective dose of virus and the appearance of symptoms or signs of 
illness, is about 14 days; ‘From 7 to 18 days; oftenest 14’ is a common statement. The third 
is that droplet infection by coughing, spitting or contamination from sputum, etc., is 
responsible for the immense majority of infections. I see no reason to doubt that these 
statements are, broadly speaking, true; they are, however, vague. According to N.E.D., 
incubation is ‘ The process or phase through which the germs of disease pass between con¬ 
tagion or inoculation and the development of first symptoms’. Symptoms, of course, are 
subjective, but probably the lexicographer includes physical signs—e.g. running from eyes or 
nose, rash, etc. This interval can only be precisely determined when the child has had but one 
exposure to another infected child. 

Such precise information is not abundant in the enormous ‘ literature ’ of measles because it 
implies exact knowledge of contacts, which is only to be had in country districts where the 
medical practitioner is fully acquainted with the social habits of his-patients. Excellent 
examples are to be found in Dr W. N. Pickles’s book, Epidemiology in Country Practice 
(1939, seepp. 32-6). On the other hand, abundant data of intervals between successive cases 
in families are available. In the second column of Table 1 are the figures provided by Stocks 
& Karn (1928); these are obtained from the record cards of all cases of measles notified in the 
Metropolitan Borough of St Pancras from March 1924 to March 1927. The authors say 
explicitly that the interval is that between the first appearance of the rash in successive 
cases. In this frequency distribution all the data are included. A family with two infected 
children could only provide a single item, but a family with three would provide two and so 
on. A medical reader would hold that none of the first four or five frequencies included 
children really infected within the family, but that they, like the first child in family to go 
down, caught the disease elsewhere. Then, noting that from the sixth day onwards the 
frequencies increase to a maximum and decrease, he would say that, from the sixth day 
onwards, the proportion really infected within the family increased. Of course a biometrician 
wishes to do better than this; he would like to dissect the compound frequency into two 
components. 

Biometrika 36 



2 The infectiousness of measles 

The easiest of all dissections is into two Poisson frequencies (see W. Schilling, 1947). I tried 
this on the St Pancras data but the result was execrable; the summed frequencies from 
interval 9 onwards were quite good fits, but from 1 to 8 hopeless the computed frequencies 
were 310, 143, 56, 47, 80, 136, 204, 266, A better result was reached by pure empiricism. I 
used all the frequencies from interval 7 as they stood, guessed values—4,10, 22, 43, 74,116— 
for the first six and then fitted a Pearsonian type III to the product of my cookery. It was 
not too bad, but of no scientific value. My algebra is not equal to fitting a type III curve 
scientifically from its tail. But even when that problem has been solved, it does not seem 


Table 1. Internals (days) between successive cases of measles in families 
(intervals of 0 or of more than 20 days included) 


Interval 

St Pancras 

0/ 

/o 

Providence, 

R.I. 

0/ 

/o 

Providence, R.I 
(between first 
and second in 
families of three) 

0/ 

/o 

___ 

341 

9-88 

263 

6-07 

61 

8-54 


246 

7-13 

184 

4-42 

43 



160 

4-64 


2-69 

21 

2-94 


117 

3-39 

110 

2-64 

26 

3-60 


96 

2-78 

111 

2-66 

20 

2-80 


90 

2-81 

114 

2-74 

20 

2-80 


99 

2-87 

189 

4-64 

40 

6-60 


173 

6-01 

247 

5-92 

41 

5-74 


206 

5-97 

388 

9-31 

75 

wmm 


318 

9-22 

619 

12-40 

110 

15-41 

11 



511 

12-27 

72 

10-08 

12 

329 

9-64 

461 

10-83 

66 

9-24 

13 

269 

7-80 

344 

8-26 

63 

7-42 

14 

205 

5-94 

244 

5-86 

39 

6-46 

15 


4-43 

142 

3-41 

14 

1-96 

16 

104 


105 

2-52 

8 

1-12 

17 



58 

1-39 

2 


18 

48 


37 

0-89 

0 

ni 

19 

38 


25 

0-60 

2 


20 

35 

1-01 

26 

0-62 

2 


Totals 

3460 


4166 


714 



that we should have a unique frequency distribution of intervals. The third column of the 
table gives the findings of Wilson, Bennett, Allen & Worcester (1938) for Providence, Rhode 
Island, 1929-34, and the fifth column their counts of intervals between first and second cases 
in families in which three children were affected. Wilson et al. do not explicitly say how they 
measured interval—they may, perhaps, have taken the interval between first signs of illness 
instead of between dates of rash—but I do not think the point material. It is obvious that the 
total Providence frequency is not congruent with the St Pancras frequency, and that the 
Providence selection is not congruent with the total. I am speaking from the biometric 
standpoint, i.e. that no two of them could be regarded (using the y 2 test) as drawn from a 
common population (naturally in comparing the Providence sets one has to subtract the 
frequencies of the selection from those of the total). But the unstatistical epidemiologist 
would say, with respect to the totals, that they are very similar; it is true that the St Pancras 




















Major Greenwood a 

mode is a day later than the Providence mode and much taller, but at least we can say that 
the most frequent interval is 10 or 11 days and the decline on the short side faster than on the 
long side. This is arithmetically vague, but good enough to justify us in thinking that in very 
few cases following a ‘ primary ’ within 5 or 6 days the patients were infected by the ‘ primary , 
and a great majority of those which were more than 8 days later were so infected. That 
information is enough to show how easily one may draw false conclusions from incomplete 
information, as I shall illustrate on a blunder of my own. 

Eighteen years ago I interested myself in the question whether any light could he thrown 
on the mechanics of infection by studying arithmetically the frequency distribution of 
multiple cases of disease in families. There should be, for instance, a contrast between 
multiple cases of enteric fever, which is certainly not conveyed by coughing and spitting, 
and multiple cases of measles, which almost certainly is. Unfortunately, data of multiple 
oases of enteric in families are hard to come by; but those of measles are common. I knew 
Dr Stocks was working on St Pancras records of measles and asked him to let me have some 
data. He kindly supplied me with four sets in which there were respectively 2, 3, 4 and 6 
children under 10, in addition to the child first infected, and three shorter series in which the 
children exposed were known not to have had measles before. I cite only series in which 60 
families were available. It was quite obvious that these data could not be fitted by 1 straight’ 
binomials for which the exponents were the numbers in family (other than the first child to go 
sick), and p was given by the ratio of total cases to total of exposed children. Then the idea 
of a chain suggested itself. Suppose the first child distributes infection binomially; he may 
infect 0 child or he may infect all the n exposed, with chances g n and p n (p being unknown); 
in these instances no chain arises. But every other term of his binomial provides an oppor¬ 
tunity; if he has infected n~ 1 (his chance of doing so being np n ~ l q)\ this leaves one child 
unaffected, who is exposed to infection from one of these [n- 1) secondaries who distributes 
binomially with chancep', and so one could have a family in which all n were infected. Now 
if each new binomial distributor does so with a different p, we shall have arithmetical diffi¬ 
culties. Biologically it seemed very unlikely that the chaining p’s would all be equal. Dr 
Stocks had in fact made it probable that exposure to infection which produces no clinical 
signs or symptoms of disease may confer some immunity. Hence the (n— l)th child in series 
is not so likely to infect the nth as the first to infect the second. However, it is sound empiri¬ 
cism to begin with the simplest hypothesis and to see if it works. Now if p is constant, the 
algebra and arithmetic are simple enough. Take n — 2. Then q 2 should give the proportion of 
families with no cases other than the primary, 2r/ 2 ( 1 — q) that of one case, and l-3r/ + 2cf that 
of two cases. The mean will be equal to 2 q 3 , so we can solve for q. If n were 3 the 

highest power of q in the equation would be 6, if n wore 4 it would be 10, and so on.* The 
arithmetic would become tedious as nincreased, but in real life, in these days of small families, 
it would be rare to have adequate data beyond families of 3! 

It is important to note that the assumption is made that the distribution of children in¬ 
fected by the first child, the primary, must be strictly binomial; as we shall soon see, this 
assumption may not be justified. 

I applied this process to Dr Stocks’s data, and in every instance the fit was satisfactory. 
Tor n = 2, in the less select and longer series (358 families) P| was 0-75; in the smaller but 

* See Greenwood (1931). 

t Here and below P relates to the y 2 test for goodness of fit, standing for the chance of a result which 
is as least as divergent as that observed. 



4 The infectiousness of measles 

more select (299 families), P = 0-58. For » = 3, P was 0-36 for the longer and 0-42 for the 
shorter series; for n = 4, 0-61 and for n = 6 (only 61 families) 0-42. I must confess that the 
concordance pleased me and, no doubt, lulled a scepticism which, I think, is my usual habit 
of mind. Really to verify the theory, one should break up the groups into their constituents, 
thus in the simplest case, that of n = 2, the proportion due to the primary is (1 - qf and the 
proportion derived by chaining 2(1 -q)*q. Well, was itl I do not know; it is possible that 
Dr Stocks could not have supplied the information and certain, in view of what I have 
already said of the difficulty of dissecting the frequency distributions of intervals, that the 
information could not be absolutely precise. But that is no excuse at all for not even asking! 

In the excellent memoir of Wilson and his colleagues already quoted, it is conclusively 
demonstrated that, to their data, this simple hypothesis does not apply. Like me, they found 
usually excellent fits to the gross data; for 416 families with n - 2, the value of P was 0-94, 
for another set of 185 families, 0*68, and for 151 families with three susceptibles P = 0*48. 

But, when the clubbed frequencies were dissected, the chain theory was demolished; in 
every instance the proportion infected by the primary was far larger and the proportion 
derived from chains far smaller than it should have been. One example (Wilson et al. 1938, 
p. 449) will suffice. 

The gross data consisted of 151 families with three susceptibles (other than the primary) 
giving 10 (9*4) 0’s, 7 (6-5) l’s, 39 (33*1) 2’s and 95 (101*0) 3’s. The figures in brackets are the 
expected numbers using the chain method with q = 0*392, and show an excellent fit, P = 0*48. 
Now let us break up the 3’s into their chain constituents. We have 

6 Np 3 q 3 = 12*7 (4), 3NpY = 15*6 (3), 3 Np*q = 39*8 (13), Np 3 = 33*9 (76). 

Perhaps it may be said that the allocation to the several groups depends on personal opinion; 
but in this instance—and, I have no doubt, in the others—different classifiers would not 
reach significantly different results. Dr Jane Worcester kindly sent me a copy of the working 
sheet allocating the 96 sets of three cases, Of the 75 attributed to the Np 3 group, the greatest 
interval between the first and third in series was 6 days (two instances) and there were eight 
instances of an interval of 4 days. On any reasonable hypothesis of short incubation most of 
these must be attributed to the primary, and even if all were classed as intrafamiliar, the fit 
to the chain hypothesis would not be materially improved. There is a still more conclusive 
argument. The simple chain hypothesis assumes that the distribution of cases due to the 
primary is a binomial distribution; in the Providence experience it is not. 

This subject has been taken up in a recent paper by Prof. E. B. Wilson (1947). In a set of 
519 families with two susceptibles other than the primary, there were 49 0’s, 102 l’s and 368 
2 s. Take the ratiooftotal cases to population, 0*807, aspandn, = 2. The binomial distribution 
is 19,162and338. In this particular set, the fit of the gross totals to the chain hypothesis was 
poor, P = 0*02, but the binomial fails just as conspicuously when applied to the primary 
distribution for 416 families which gave P = 0*94 for the chaining method when used on 
massed frequencies. Here the gross totals were 51, 67 and 298. Of the 298 sets of two cases, 
36 were found to be due not to the primary, so that the distribution due to the primary is: 
0, 51; 1,103; 2, 262. The binomial distribution forp = 0*7536 is 25*2, 154*5 and 236*3, which 
is quite hopeless, This at once suggested, what I should have seen before posing the hypo¬ 
thesis, that if the families are heterogeneous in respect of risk, the summed frequencies 
o tained by adding 0 s, 1 s, 2 s, etc., and deducing a p from the massed data could not be 
a straight binomial. This was first pointed out, I think by Karl Pearson (1917). It is easy to 



Major Greenwood 


see that, if we add in this way, the variance obtained will not he npq, where p and q are the 
weighted means of the several values for the summed binomials but will exceed npq by 


n{n- 


ISm s pl 

(Sm s p s y\ 

l N 

l N )] 


( 1 ) 


where n is the exponent and m 8 and p s the number of observations and value of p for the «th 
binomial; N — Sm a . Clearly this only vanishes if the variance of p vanishes. 

Is this, however, anything more than a debating point? I think it is, for this reason. It is 
hard to doubt that measles is conveyed by droplet infection and, if it is, the contiguity of the 
susceptibles to the source of infection must be a factor of the attack rate. So far as I know, 
there are no published data giving attach rates on groups of n susceptibles, primary attack 
rates, tabulated by numbers of rooms occupied. That mortality rates are negatively correlated 
with social status is, of course, a common-place of epidemiological literature and the usual 
explanation is that in poor families children are exposed to infection at an earlier age than in 
the families of the well-to-do; the fatality rate of measles diminishes steeply with advancing 
age. No doubt it is implicit in this argument that infection is increased by overcrowding 
but without data we certainly cannot say that in economically homogeneous data, the 
distribution of primary infections is binomial. Without fresh data, we can do no more than 
test whether the, by hypothesis, heterogeneous aggregate can be subdivided into constituent 
binomials. At first this seems trivial; if there are s subgroups and s values of p and the only 
conditions imposed are that the mean value of p and the variance of p must be reproduced 
and, of course, that all then’s must be positive and less than unity, the number of solutions 
must be great. However, there are two other * common sense ’ restrictions. The subfrequencies 
must be reasonable; the frequency distribution of persons per room must he unimodal and 
then’s must decrease as the number of persons per room decreases. Take Wilson’s set of 519 
families of two susceptibles. The variance of the aggregate (49 0’s, 102 l’g, 368 2’s) is 0-42569 
which exceeds the binomial variance—2 x 0-807 x 0-193 by 0-11419 so if this aggregate iB a 
sum of binomial frequencies the variance oip is 0-05709. I split up the total of 519 families 
into six subfrequencies, 47, 189, 147, 81, 25, 30. This is obtained from the proportional 
frequencies of families of three living in 1, 2, 3, 4, 5 and 6 or more rooms in the Metropolitan 
Borough of St Pancras in 1931. I then adjusted the p’a on this principle: That the first should 
be unity, i.e. that in the very crowded tenements both children should he infected and that 
in the most spacious tenements neither child should be infected and that the p’s should 
decrease from 1 to 0. Actually if the intervening p’s are 0-9, 0-87,0-795 and 0-389, the variance 
of p is 0-057 and the aggregate subtotals closely reproduced (47, 106, 366). Of course, many 
other solutions are practicable; the only value of the trial is that there is nothing plainly 
preposterous in the run of the p’a. 

The obvious criticism from the practical point of view is that to reach homogeneity re¬ 
quires an enormous mass of data. Suppose what I am suggesting were done (in the records 
of public health departments in this country and in the United States an immense amount of 
information lies unpublished) there would still remain possibly relevant heterogeneity. Still, 
the problem is an interesting one; that the primary distribution should be binomial, or 
approximately binomial, is a seductive hypothesis. 

But it may fairly be objected that we have the massed data and it is interesting to try to 
interpret them. Wilson has made two suggestions. We infer from the data that measles does 
not spread within the family as if the children were independently infected and there are at 



0 The infectiousness of measles 

least two ways of characterising the dependence. The first may be described in his own words 
(the illustration is the set of 519 just discussed). 

One way to express dependence of elements is to compute the number which would be required by the 
theory of chance to explain the observed standard deviation. The actual secondary attack rates in the 
families with 0, l, 2 secondary cases are 0, 0-5, 1-0 respectively; their mean is 0-807 and their standard 
deviation squared (variance) is 0-106. If this be equated to pqjn we find n = 1-46. Thus the two suseept- 
ibles in the family are behaving relative to the contracting or escaping the infection as though they were 
about one and one-half. 

I have substituted 0-106 for 0-016 which is an evident misprint. Herep is the mean value of p. 
The plan is to substitute for the observed table a binomial the exponent of which is deduced 
from the variance of p. I have a certain awe of binomials the exponents of which are not 
integers, perhaps owing to a recollection of v. Bortkiewicz’s furious polemic entitled 
Realismus und Formalismus in der mathematischen Statistik (1918) directed against a paper 
by L. Whitaker (1914). The last-named author had found better fits to some data used by 
v. Bortkiewicz to illustrate the Poisson series by using fractional or negative exponents to 
binomials. 

Taking Wilson’s example, we have: 

(0-8073+ 0-1927 ) 1-46227 = 0-73124 (1+0-2387) 1 ' 46227 , 
and the successive terms are 

0-73124, 0-25523, 0-01408, -0-000603, 000056, etc. 

Now if we use these as frequencies for variates 1-462, 0-462, -0-538, —1-538, —2-538, etc., 
and compute the mean and variance, we reproduce the proper mean and variance but do not 
reproduce the frequencies. Ignoring terms after the third—-the sum of the first three terms is 
1-00055—we find for expected frequencies, 379-5, 132-5 and 7-3, not much better than the 
integral binomial. 

Wilson gives a sot of 100 families with three susceptihles, as follows: 

No. attacked 3 2 10 

Frequency 07 18 11 4 

This time the mean pis 0-826667 and its variance 0-07884, so that n = 1-81745, giving for the 
binomial 0-70754 (1 + 0-20967) 1 ' 81745 . The sum of the first five terms is 1-00016; neglecting the 
rest, the mean and variance are correctly obtained to 3 places for the mean and 2 for variance, 
but the frequencies are poorly reproduced, namely 70-7 for 67, 26-9 for 18, 2-3 for 11 and a 
small negative value for 4. 

A better method, as Wilson suggests, is to use the method of association. When there are 
two susceptibles, call them A and B and form the table 



+ 

- 

+ 

(AB) 

(aB) 

- 

(Ah) 

(<tf) 


In this (AB) means the frequency of both susceptibles going down, (aB) the frequency of A 
not falling sick hut B doing so, etc. In such a table as that containing the set of 619 already 
discussed one cannot distinguish A from B (which, as Wilson notes, might be done if one 
used, for instance, age of each member of a pair as a distinction) so A must be put equal 
to B. One can then proceed on Yule’s lines for associated attributes. In the case before us 



Major Greenwood 


7 


the secondary attack rate is 0- 807, but if one of the pair is attacked, the chance the other will 
be attacked is 0-88, while if one of the pair escapes the chance the other will escape is 0-49. 

It occurred to roe that another way of bringing out Wilson’s point would be to use the 
theorem that if the events are not independent, cr 2 = npq(l + r(n~ 1)), where r is the arith¬ 
metic mean of the \n(n — 1) correlations of the variables, viz. the product-moment correla¬ 
tions of variables restricted to the values 0 or 1. 

Suppose we have to do with n correlated ‘events’ of this,type, we are concerned with a 
succession of 0’s and 1 ’s and for each of the n items the mean is p and the variance pq. Hence, 
if we estimate the expectation of the <th event from the results of the preceding t — 1 events, 
each regression coefficient will be a coefficient of partial correlation (for all variances are 
equal) and we have Xf _ p ^p im {x 1 ~p)+p 2tl3 (x 2 ~p) + .... 

Now if we put in this equation the values of x v x 2l etc. (the values must all be l’s or O’b), we 
reach, let us say, the value k. This will be its expectation. But, as x t must be either 0 or 1, this 
amounts to saying that the probability that x t will be 1 is k and the probability it will be 0 is 
1 - k. The equation is, however, useless unless we know the values of the partial correlations. 
Let us suppose they are all equal. In the particular case of n = 3—the only one I shall dis¬ 
cuss—they would be each r/(l 4- r). The first event 0 or 1 has expectation p or q. The regres¬ 
sion equation of the next event x 2 onx Y is x 2 = rx l +p{ 1 — r) giving for aq = 1, r+p(l — r), for 
x i - 0, p( 1 — f), so the probability of (II) is p i +pqr, of (10) pq(l — r), of (00) (f + rpq and of 
(Ql)pq(l-r). 

If x x and x 2 are given, the regression of x a on them is 

*3 = x x rj(l+r) +x 2 rl(r-*r\)+p{l — 2r/(r +1)}, 
and the 8 probabilities can be calculated. Tor instance, the probability of the succession (010) 

pq(l -r){\ —rl(l +r) -p(l - 2r/(l + ?•)} 


pq 


{(1-r) {q+pr)} = JL-{(p~p r )(q+pr)}. 


l+r K '~ ' ,n ' Jr n 1 + r 1 
The expectations of (010) (001) and (100) are, of course, the same, so the probability of 1 


success is 


p~{(l-r) (2+Jw)}. 


In this way, we reach for the complete ‘successes’ distribution 


0 iJ- r (<l + r T)(4 + r P + r )’ 

1 t^,(7+^)( 1 -r), 

2 ^(*>+*9) (I-* - ), 


3 


pjp+qr) 

1+r 


(.P + qr + r ). 


Leading to an area of unity, a mean of 3p and a variance of 3pg(l + 2r). One can proceed in 
this way step by step to deduce the 16 frequencies for the case n = 4 and so on. 

Dr J. 0. Irwin has, however, obtained an elegant general solution which he will, I hope, 
publish, for it may be of value in other investigations. In these days of small families, one 
will not often have sets of more than 3 susceptible children apart from the first infected in 



8 The infectiousness of measles 

measles and I confine myself to trying out the method on Wilson’s 100 sets of 3 susceptibles 
beyond the first. Here p = 0-82667, and r = 0-32538 giving: 


Cases 

- -- -— 

Observed 

Expected 

0 

4 

4-44 

1 

11 

9-68 

2 

18 

19-32 

3 

67 

- 

66-66 


yf = 0-7091 and P (for one degree of freedom) = 0-40. 

One has a temptation to speculate on the results of not assuming that the correlations are 
equal, but it would be pure speculation and, although the method interests me, I am not sure 
that it is better than the straightforward calculations Wilson proposes, although it does permit 
of deducing a y 2 which, for old acquaintance sake, is pleasing to me. As I have indicated 
earlier in this paper, one needs more precise and homogeneous data than the aggregated 
results of a city survey furnish. The comparison of two aggregates is subject to another 
difficulty to which Wilson and his associates draw special attention (Wilson et al. 1938, 
pp. 443-4). Neither the St Pancras data of Stocks upon which I worked, nor those of 
Providence, R.I., are complete records of the occurrence of measles in the areas. Stocks 
estimated that as many as 70 % of the cases occurring were reported; Wilson and his col¬ 
leagues put the proportion in Providence, R.I., no higher than 50 %. The secondary attack 
rates in Providence were a good deal higher than in St Pancras. Prima facie measles was 
muoh less infectious under the conditions of family life in St Pancras than under those of 
Providence, R.I., Wilson et al. write; 

One need not overlook the possibility that with only 47 per cent reported in Providence, it may be that 
those families with a large number of secondary cases in proportion to their total susceptibles are dispro - 
portionately frequent in the reported as compared with the unreported families. The discrepancy, however, 
between the secondary attack rate in St Pancras and Providence is so great that the rates cannot well be 
reconciled on the hypothesis that there is a differential in favour of higher attack Tates in the reported 
families in Providence unless one assumes a differential in favour of lower attack rates in the reported as 
compared with the unreported families in St Pancras. This leaves a quite enigmatic situation and makes 
any comparisons problematical (op. cit. p. 444). 

I have nothing to add to this clear statement. One might surmise that in a family in which 
many children come down together or within a short time, i.e. are infected from an extra 
familiar source or from the first in series, domestic help may be more urgently needed than 
when the disease is passed on by chaining. But I can see no reason why such a bias, if it 
exists, should be more effective in St Pancras than in Providence, R.I. 


REFERENCES 

Greenwood, M. (1931). J. Hyg., Camb., 31, 335-51. 

Pearson, K. (1917). Biomelrika, 11, 139-44. 

Pickles, W, N. (1939). Epidemiology in Country Practice. Bristol. 

Schilling, W. (1947). J. Amer. Statist. Ass. 42, 407-24. 

Stocks, P. & Karn, M. N. (1928). Ann. Eugen., Lond., 3, 361-98. 

YON Bortkdswioz, L. (1918). Allgem. Statist. Arh. 9, 225. 

Whitaker, L. (1914). Biometrika, 10, 36-71. 

Wilson, E. B. (1947). Proc. Nat. Acad. Sci., Wash., 33, 68-72. 

Wilson, E, B., Bennett, C., Allen, A. & Worcester, J. (1938). Proc. Amer. Philos. Soc. 80, 369-766. 




[ 9 1 


A NOTE ON THE ANALYSIS OF GROUPED PROBIT DATA 
By K. D. TOCHER, B.So,, National Physical Laboratory 
Introduction 

If the members of a group of objects are subjected to a common level of a certain stimulant 
and the number affected recorded, the proportion of these to the total is a measure of the 
effectiveness of the stimulant. If such data are available for several levels of the stimulant it 
is desirable to find a method of describing the effect in terms of the level of stimulant. 

Probit analysis achieves this by assuming that there is a critical level of stimulant asso¬ 
ciated with each object, being the least level giving an effect, all higher levels being assumed 
effective. The objects are supposed drawn at random from a population with a normal 
distribution of critical levels.* The relation between level and effect is completely specified by 
the mean and variance of this distribution. The statistical problem is reduced to estimating 
these parameters and assigning confidence limits on the level giving a fixed proportion of 
responses on randomly chosen subjects. 

Many other real experiments may be reduced to this pattern by a change of language. 
A common variant in psychological work is obtained by subjecting groups of children of 
a common age to an intelligence test. The level of stimulant is replaced by the common age of 
the group, and the effect becomes the proportion passing the test. 

Whereas, in the usual probit problem, the level of stimulant is under the experimenter’s 
control and can be fixed exactly equal for each member of a group, the children in the groups 
will not all be exactly the same age. If each child is regarded as a group of one the usual 
technique may be applied, but the actual calculations are tedious due to the unequal spacings 
of the ages and the large number of groups. Eor this method to be applied the actual ages of 
the children must be known, and this is not always the case. 

Thus it is desirable, and, in some cases, necessary, to have a method of analysis of grouped 
data. This paper derives a technique of achieving this and shows that an approximation to 
the correct maximum likelihood estimates can be obtained by applying Sheppard’s correc¬ 
tions to the variance estimated by neglecting the grouping in a normal probit analysis. 
Examples of the technique are given. 


The exact maximum likelihood estimates 

Suppose there are m groups, the sth group containing n s members, with ages uniformly 
distributed in the range (^ 31 x a ), of which r 8 pass the test (s = 1,2,.m). 

If the critical age % has a normal distribution with mean (i and variance <r 2 then 

Y = Y x = = ax+b (say) (I) 

has standardized normal distribution. 

* In some cases some simple transform of level, such as the logarithm, is assumed to have this normal 
distribution. 



10 A note on the analysis of grouped probit data 

The probability that a child of age u passes the test is 



and hence that for a child in the <sth group is 




0) 

where 

A s = *.-*. 

(4) 

and 

<i>(z) = | 

(5) 


A ^ 

The usual iterative solution for the maximum likelihood estimates a, b uses the mean- 
value theorem identities 

dL d*L, ys. 3 2 L .. - 


3 a db ' 


dL d*L , . ( d*L. h a 


where L is the likelihood of the result on the values a , b ; dLjda 0 , dLjdb 0 are the partial deriva¬ 
tives of L with respect to a, b respectively at the values a 0 , 6 0 and d 2 L/da 2 , d 2 Ljda db, d 2 L/db 2 
are the second derivatives of L. 

These identities only hold exactly if the second-order derivatives are evaluated at the 
appropriate point within the square with diagonal {a 0 ,b 0 ) (a,b). Since this point remains 
unknown approximate values of the second-order derivatives are used, and the process 
repeated on the resulting estimates until these no longer alter. The usual approximations 
used are the expected values of the second-order derivatives at (a 0 , b 0 ). 


Now 

where Q, = 1 - P„ 


L = C 1 + S h log P s + {n s - r s ) log Q h }, 

S = 1 


dL 

3a 


■?ur~5r)a; 


We require expressions for P s , dPJda, dPJdb. Integration by parts gives 


r rau+b J -]* rx 1 

-;(r,r«,)+z(r„», 




i 

4$*)* 




( 6 ) 

(7) 

( 8 ) 


(9) 


where 


(10) 



K. D. Tocher 


11 


P s is obtained from (3) and (9). Also 


da a 


jp(7)+r 


dP(Y) 

dY 


+ 


ZZ{Y)\dY 1 
dY da a 


4{x) 


= hxP(Y)-ftx)\ = -±[bP(Y) + Z(Y)], 

to to 


(H) 


3 o a ob a 


( 12 ) 


the suffix x on Y being understood. 


Clearly 


dP 3 _ l rd<p(x)ix s 
da A s |_ da _x s 


(a = a, b), 


(13) 


and this with (1L) and (12) gives dPJda, cPJdb. 

The quantities required for the first step in the iteration can now be calculated, and with 
the new values of a and b obtained the process repeated. A slight saving of labour results 
from using the initial values of the second derivatives in the successive steps rather than re¬ 
calculating them, without any material increase in the number of steps required. 


A NUMERICAL EXAMPLE 


In the particular but important case in which the gaps are of equal width, non-overlapping 
and completely covering a range of ages the calculations can be reduced to a reasonable 
tabular form shown for an example in Table 2. 

In this case the centres of the age groups are separated by equal distances which may be 
used as a unit of age. The upper end-point of one group is the lower end-point of the next, and 
P„, with its derivatives, can be formed by differencing (p{x) and its derivatives. 

(1) For each end-point the following functions are tabulated using the trial values of a,b: 


(i) y, (ii) P, (iii) Z, (iv) YP + Z, (v) bP + Z. 

(2) The differences of (iv) are aP, from which aQ is easily obtained (cols, (vi) and (vii), 
Table 2), 

(3) Using the data, S = is then tabulated for each group (cols, (viii) and (ix)). 

aP aQ _ 

9jP dP 

(4) (ii) and (v) are differenced giving a— and respectively (cols, (x) and (xi)). 


3a 


fllj 0jG 

(5) The scalar product of these two columns with S give and - a ~~. 
• is formed as col. (xil). 


( 6 ) — _ . 

(aP)(aQ) 

(7) The products of (xii) with (x) and (xi) are tabulated (cols, (xiii) and (xiv)). 

(8) The scalar products of (x) and (xi) with (xiii) and (xiv) give 


3 2 Z 
36 2 ’ 


3 2 X 
a 3a 3b 


and -a 2 


3 2 L 

3a 2 ' 


The mixed derivative is formed twice as a cheek. 

(9) Solving the system of linear equations in the usual way corrections to the trial (a, b) 
are found, and the process repeated to stage (5) as often as necessary. 



12 A note, on the analysis of grouped probit data 

The usual process of reducing the calculation to a weighted regression line is not possible 
with grouped data, as P is not simply a function of Y but also depends explicitly on a. 

The calculation is now illustrated with an example. Table 1 giveB the number of children 
in various groups who show certain secondary sex characteristics together with the total 
number in each class. 


Table 1. Results of examination of children for secondary sex characteristics 

classified by age 


Age group* 

No. in group 

... 

No. with sex 
characteristics in group 

9 

48 

0 

10 

100 

0 

11 

95 

1 

12 

100 

5 

13 

91 

13 

14 

94 

47 

15 

56 

38 

10 

33 

29 


* Children whose ages are x years ± 6 months are in age group *• 


Converting the proportions of sex-characterized children into normal equivalent deviates, 
and plotting against age we obtain as rough estimates 


a ~ 070, b = - 3-70 (using an origin of 9 years). 


The end-points are 8^, 9|,..., 16| years. Table 2 contains the necessary calculations as out¬ 
lined above for the first step of the iteration arranged in tabular form. This gives 


dL 

da 


15-7004, 


dL 

db 


0-62830, 


d 2 L 

da? 


-4066-41, 


d 2 L 
db 2 ~ 


-174-699, 


d *L 
da db 


814-832, A 


WLdfL I d 2 L V 
da 2 db 2 \3a db) ~ 


4-64476\t 


The corrections to subtract are: 


A- 1 


2-15301 -5 . 


, 1 (d 2 LdL d 2 LdL\ 

Sa = AWhL-fadb-db) = -2-16301-*x 2-23089* = -0-048, 

« l (d 2 L5L d 2 LSL\ 

Sb ~ A\3? » 7 8S36 05 ) = 2 - 153 °l" >< i’ 023 ^ 4 = f 0220. 

This gives new values of a = 0-748, b = - 3-920. 

Repeating the first step but omitting columns (xii), (xiii), (xiv), which are only needed to 
determine the second derivatives, we obtain dLjda = -0-630275 and dLldb = -0-299514. 
These give, using the old second derivatives, corrections 8a = - 0-003 5b = + 0-015 The new 
start is a = 0-751, b = -3-936, ’ 

The successive corrections and approximate values are given in Table 4 
t 4-64476 4 means 4-64476 x 10\ and similarly elsewhere. 




Table 2. Tabular form of calculation for grouped probit data 


K. D. Tooher 


13 








































14 


A note on the analysis of grouped probit data 


Table 4. Steps in iteration 


Step 

Starting values 

Corrections 

----- — 

(subtract) 

--- 

a 

b 

a 


i 

0-70 

-3-70 

-0-048 

+ 0-220 

2 

0-748 

-3-020 

- 0-003 

+ 0-015 

3 

0-751 

-3-935 

- 0-0003 

+ 0-0022 

4 

0-7513 

-3-9372 

-0-00004 

+ 0-0004 

6 

0-7613 

-3-9376 

-0-000006 

0-00003 


The final maximum likelihood estimates are thus a = 0-7 513 and b =« — 3-9376 obtained in 
five steps. Thus the distribution of critical ages for showing the sex characteristic in question 


has a mean 9 + 


3-9376 

0-7513 


14-24 and a standard deviation - ----- ~ 


1-33. 


The number of significant figures retained has been 6-7 throughout. It has been found 
from practical examples that this figuring is required if the derivatives are to be obtained to 
3-4 significant figures. The use of less accurate values would increase the number of steps in 
the iteration, and it seems more profitable to carry a few extra figures at each stage to avoid 
these additional steps. 


An APPROXIMATE SOLUTION 

Each step in the iteration proposed and illustrated in the last section is a lengthy calculation 
and renders the whole process very tedious. If a simple method can be obtained for reaching 
an approximate solution of greater accuracy than the usual guess trial the number of steps of 
the form above can be reduced to 1 or 2. Since the data are formed by grouping probit data, 
corrections to the ungrouped answers would seem a promising approach. 

We consider the case of equal width groups used as unit scale of age, although these need 
not completely cover any range. 

Expanding the integrand below as a Taylor expansion we have 

-rm-iyzm. 


r n-r , r n-r 



where Q 1 P, the argument Y in P and Z is understood and a a /24 is assumed small. 




K. D. Tocher 


15 


where a = a or b, and the summation is over a suffix s, dropped to shorten the notation, The 
terms in square brackets may, for a 2 /24 small, be regarded as corrections to the first term 
which is the value of dLjda obtained treating the data as ungrouped. We replace the correc¬ 
tion terms by their expected values obtaining 

d_L a r. ni 
da 

being the likelihood calculated on assumption of ungrouped data. 

Even cruder approximations serve for the second derivatives, as these approximations 
only affect the Tate of convergence of the iterative system and not the final answer. We shall 
use the ungrouped values, viz. 

32i,-1 nZ*dYdY . . 

E \jdadfi\~ S PQ da dp (a,/? 

Z 1 

Using the usual notation 
the iteration equations reduce to 


dL fi o> nZ dY_ 

: da + 2i PQ 3a’ 


PQ 


= to, 


3 L n a? 


3 L n a* 


Ynwx 2 6a + hnwx8b ~ + — YnwYx, Ynwx8a p Ynw 8b - P— Ynw Y 


3 b 24 


8a, 8b are corrections to trial values a 0 , b 0 . Let 8a 0 , 8b 0 be corrections treating the data as 
ungrouped. Then if 8a l = Sa-8a 0 , 8b 1 = 8b-8b 0 , we have 

a z 

Y,nwx i 8a l +'Zniox8b 1 — — [Lmvx 2 a 0 P 'Lnwx 6 0 ], 

Ynivx 8a x p Ynw 8b x = — {Lnwx a 0 p Ynw 6 0 }, 
and the solution of these is clearly 

(£>2 

^ a i — 24 a o> ^i = 24^o- 

Finally, if a, b are maximum likelihood estimates of data treated as ungrouped, approxi¬ 
mate solutions for the estimates allowing for grouping are 

The mean critical age is - 6/a = — 6/a, while the variance is 

a 2 a 2 \ z4/ a 2 12 

Thus our approximation reduces to the usual Sheppard’s correction for grouping applied to 
the mean and variance estimated assuming ungrouped data,* 

* If the group width is R, the multiplying factor above becomes ^1+ °^ giving the usual reduction 

of fi’/12. It should be recorded that the analysis of this section originated from a suggestion of E. C. 
Fieller that such corrections would account for the difference between the two analyses. 



16 A note on the analysis of grouped probit data 

At first this result is rather surprising. However, it must be remembered that the correc¬ 
tions to moments introduced by Sheppard are true for the populations, and the use on 
samples is simply an extension of the principle of estimation by equating sample moments to 
population moments introduced by K. Pearson. As all estimates are approximations to the 
true values of parameters, all estimates of moments from grouped data will be approximately 
related to the moments from ungrouped data by Sheppard’s relations. 

Numerical illustration of approximation 

We may apply this approximation to the example used before. Using the same guess position 
a — 0-70, b = —3-70 a normal prohit analysis is performed. 

The arithmetic necessary for this is set out in Table 3, using the improved technique 
suggested recently by Fieller. This table contains the complete working for all the necessary 
steps of the iteration, while Table 2 consists of only one step in the iteration for that method. 
This comparison of the two tables emphasizes the amount of computing this approximation 
saves. 

Treating the data as ungrouped we obtain 

a' = 0-734, b' = - 3-848. 

Applying the correction we have 

a=M '( I+ li) “ ° ,75 °’ 6#6 '( 1+ li) * 3 ' 934 ' 

This approximation differs from the correct solution by about 1 part in a 1000 and for most 
purposes would be adequate. However, if greater accuracy is required only one step of the 
exact process is necessary to give the solution 

a = 0-7513, 6 = -3-9373 

correct to four significant figures. In obtaining this result it is immaterial whether freshly 
calculated second derivatives or the ungrouped approximations - £u>, — Iw, - 'Lwx 1 from 
Table 3 are used. 

The calculation of confidence intervals 

At the present stage of development exact confidence limits are not available for the quantiles 
in an ungrouped probit analysis. The usual approximation used depends on the limiting 
normal form of distribution of maximum likelihood estimates and the substitution of sample 
estimates of the parameters into the expressions for the second moments of that distribution. 

Intervals of the same type can be deduced in the grouped case. Those for the quantile 
corresponding to a normal equivalent deviate Y are given (Fieller, 1944) by the roots of 

(+z 2 L 22 ) a; 2 +2(A a(b -Y)~ z*L lt ) x + (Mb - Y) 2 + z 2 L n ) = 0. 

where a , b are maximum likelihood estimates, z is the normal equivalent deviate correspond¬ 
ing to the confidence level required, L n , L l2 , L 22 are 0 2 L/0a 2 , d 2 Ljda 96, 0 2 £/06 2 respectively, 
and finally A = L U L 22 - L\ 2 . 

These roots are x = {z 2 L n - M(b -Y)± z^A [Q - z 2 ]*}/(a 2 + zlL 22 ) A, 
where Q = ~{ML n + 2a(b- T)4 a +(6.- Y) 2 L n ). 



K. D. Tocher 17 

In the example the 95 % confidence interval for the median age is obtained by putting 
Y = 0, z — 1-96, a = 0-7513, 

L n = - 3913-82, L n = - 783-214, 

A = 38315-3, 

116346-23703 + 37236-77602 
whence x - 22266-79 

measured from 9 years as origin. Thus the median age of the age distribution is between 
12-55 and 16-90. If the data were ungrouped the corresponding interval would be 12-70 to 
15-76. The increase in length of confidence interval with grouping is to be expected as there 
is a loss of information on grouping. 

If the interval is calculated using the grouped values of a and b but the ungrouped values of 
the L’b, we obtain 12-66-15-79 which are still too close. Thus in applying the approximate 
method, one iteration by the exact method at the end is desirable to furnish exact values of 
the L’b. 

Summary 

1 . The necessary equations for an iterative solution to the maximum likelihood estimates 
of mean and variance in the underlying normal distribution of grouped probit data are 
derived. 

2. In the case of equal interval grouping a numerical process is described and illustrated. 

3. For the same case it is shown that Sheppard corrections to the estimates obtained from 
treating data as ungrouped are a close approximation to the exact answer, and this is 
illustrated in the previous example. 

4. The usual process for determining confidence intervals is used and the increase in length 
of interval noted. 

The work described above has been carried out as part of the research programme of the 
National Physical Laboratory, and this paper is published by permission of the Director of the 
Laboratory. 

REFERENCE 

Fibllbb, E. C. (1944). Quart. J. Pharm. 17, 117. 

* These are values obtained at last step of iteration. 


b = - 3-9376, 

L n = -166-523 * 


3 


Blometrika 36 



[ 18 ] 


A GENERALIZATION OF POISSON’S BINOMIAL LIMIT 
FOR USE IN ECOLOGY 

By MARJORIE THOMAS, University College, London 

1. It is of interest to field ecologists to estimate the abundance of a given species in a 
commonwealth of plants. The method usually employed for this purpose is that of sampling 
by quadrat. A square lattice—the quadrat—is dropped at random points in the common¬ 
wealth, and the number of plants of the given species found in the quadrat is counted. 

It is thus possible, with repeated sampling, to form a frequency distribution of the number 
of quadrats containing k plants (k = 0,1,2,...), and the mean of this distribution gives an 
estimate of the density of the species, that is, the frequency with which, on the average, the 
species occurs in the commonwealth. In describing such observations mathematically, it has 
been customary to assume that the individual plant has no area, and further, that they are 
distributed randomly within the commonwealth studied. Provided the quadrat is large 
compared with the individual plant, this first assumption is justifiable, but the second, that of 
randomness, is recognized by the plant ecologist to be far removed from reality in the case of 
many species. Archibald (1948) has collected material and analysed it to show that for 
a number of species the hypothesis of randomness will not hold owing to the tendency of the 
plants to cluster together. We put forward here a series which will allow for this clustering, 
and which will also enable us to obtain an estimate of it. 

2. It is a characteristic of the observational series collected from quadrat sampling that the 
variance is greater than the mean, a result which is attributable to the clustering of the 
observations. Were the plants distributed at random, it might be expected that Poisson’s 
binomial limit would describe the distribution—as, in fact, it does for some species—so 
a generalization of Poisson suggests itself as appropriate for species in which the clustering 
affects the variance, Archibald (1948) fitted Neyman’s (1939) contagious series to such 
observations, the parameters m 1 and m t of the distribution being taken as proportional 
respectively to the number of clusters in the data and the average number of plants per 
cluster. 

Another generalization arises from the following set-up. We assume an area over which 
a number of points is distributed at random. With each of these points a random number of 
other points is associated. The area is now divided into squares, and we calculate the proba¬ 
bilities that a square contains 0,1 , 2 ,... points. Thus if x is the random variable associated 
with the first distribution of points, we write 

P{x points in any one square} =-— 

X 1 

Let y be the random variable associated with the random number of points related to the 
first points, so that 

P{y +1 points in any one group} = -——. 

y\ 

* The notation P{ } is used in this paper to denote probability. Thus the expression P{x points in any 
one square) may be understood to mean ‘ the probability that the number of points in any one square is*’. 



Marjorie Thomas 

If now the first points are taken to represent cluster centres, and the second points the 
number of additional plants (after the first) in a cluster, then for any quadrat, 


P{0 plants} = P{0 clusters} = e~ m , 

P{1 plant} = P{1 cluster of 1 plant} = roe _w r A , 

P{2 plants} = P{1 cluster of 2 plants} + P{2 clusters of 1 plant} 

= wie _m Ae _ N-—( e r 


me -<"i+X) 

2! 


(2A + me _A ). 


In general, we have 

k 

P{k plants} = 2 £P{ r clusters having a v a 2 , ...,a r plants}, 

r=l a 


where x 1 + a 2 +...+a r = k and a # >0 for/ = 1,2, ...,r. 

The second summation is over all possible sets of a which fulfil these conditions. Thus 


k rn r P~ m r l«j-i*-a 


k 

= s 


r\ 

m T e~ m 


r-1 rl 
1 


A i " -r e~ rA 2 II 


7-i («,-!)!* 

3. We require to evaluate 2 II ^ ~f^j • For simplicity write /7 y — ocy— 1. Then 


2 A - 2 («,-l) = 

1-1 1-1 


and since oq is an integer greater than zero, we have for j = 1,2, 

Now consider the expansion of 

(Pl+Pi+---+Pr) k ~ r > 

where p { = 1 for i = 1,2,..., r. It is clear that 


(*-*•>! n 7 ti 

l-i AP 

is a term in this expansion and, writing 2 denote summation over all possible seta 
of /?, we have 

? (fc " r)1 ,5i (^=7)1 - ? ( *- r)! n = (Px+J»,+ ...+3V)* r “ r* 

It follows that t, „ ,, 

P{k plants in any one quadrat} = 2 — y— ~ p.. (I) 

Giving fc successive values 1,2, 3,.... the required series is obtained. We shall refer to it m 
the double Poisson distribution. As a check we may note that 

<x> k 


y y m r e~ m (rA) fe ~ r e~ rA • m r e-™ « (rA)*-^ « m r e" m 

A -i r! (*-r)l “£ a r! *f r ~(ib-r)! = £ — = 1 - e-«, 

which is as would be expected. 



20 A generalization of Poisson's binomial limit for use in ecology 

4. The moments of the double Poisson distribution may be obtained most simply by 
the device just used of inverting the order of summation. For the first moment we have 


= E 

fc=l 

oo 

= 2 

r= 1 


k 

r= 1 

m r e~' 
r\ 


m r e~ m (rA.) k ~ r e rA 
r! (k— r)! 


s 

fc=T 


[(fc —r) + r] 


(rA) fc ~ r e- rA 

(&-r)! 


Similarly, 


= S -^-(rl+r) 

r«l ' • 

= wi(l + A). 

for the second moment about the origin, 

(rAr~ r e- 


t “ , 9 m r e~ m (rA) A ~ r e _rA 

A = "7! (FTjT 


QO fyylf «~m <*> 

= 2 2 [(fc-r) 2 + 2r(£-r) + r 2 ] 

r-1 r ' k-r 


(rA)*~ r e _rA 
“ (i-r)f 


= 2 ^4— (rA + r ! A 2 d-2r 2 A + r 2 ) 
r=i r! 

= mA+(m + m 2 )(H-A) 2 . 
Converting to the moment about the mean, 

/fa = mA+m(l + A) 2 
= m(l + 3A + A a ). 


(2) 


(3) 


5. The numerical fitting of the series to observational material can be carried out in at least 
two ways. The first, and the more usual statistically, is to calculate the mean and variance of 
the observed series and, equating these numbers to the theoretical mean and variance, solve 
for m and A. The disadvantage of this procedure from the ecologist’s point of view is that an 
exhaustive count of all the plants of the species considered in the quadrats is necessary in 
order to be able to estimate the average number of clusters per quadrat, m, and the mean 
number of plants per cluster, 1 + A. As an alternative therefore we may obtain maximum 
likelihood estimates of m and A, following the procedure given by Fisher (1922) and outlined 
by Tippett (1932) for the Poisson series. To obtain these estimates we need a knowledge only 
of the number of quadrats with zero plants, n 0 , the number of quadratB with one plant, n v 
and the total number of quadrats, N. The values chosen as estimates of m and A are those 
which maximize the expression 


L = log e [(e~ m ) n ° (wc' m r' l ) n i (1 - e~ m - we _m e -A ) JV_n » -n »], 
and hence satisfy the equations 

dL = _d_L 
dm 3 A ‘ 

On performing the differentiations, we find that m and A are given by the simultaneous 
equations 

e_ro = n 0 jN, me~ A = n x jn a (4) 

which may be quickly solved. It will be noted that these equations are precisely those ob¬ 
tained by equating the observed and theoretical frequencies in the first two groups. The 



Marjorie Thomas 

values obtained may then be substituted in the expression m(l + A) to obtain the maximum 
likelihood estimate of the average density of plants of the particular species considered. It 
may be noted in passing that, if A = 0, when the double Poisson series reduces to the simple 
Poisson with parameter m, the equation e~ m = n 0 (N is the same as that found by '1 ippett, in 
the paper referred to above. This is as expected. Data are supplied by Archibald (for 
Armeria maritima and Plantago maritima, two speoieB which were counted on Blakeney 
Marsh. The theoretical series obtained by each of the two methods are set out below in T id de» 
1 and 2 and are compared with the series given by a Poisson distribution. ^ is not significant 


Table 1. Distribution of Armeria maritima 


No. of plants 
per quadrat 

0 

1 

2 

3 

4 

6 

6 

7 

8 

_ 

9 

10 

Tj 

and 
over ! 

r.j 

Total j 

Observed number of 

57 

6 

12 

6 

5 

5 

7 


M 

1 

1 

■ 


quadrats 

Poisson: Expectations 

20-61 

32-64 

25-70 

13-54 

5-35 

1-69 

0-45 




... 

B 

MM 

Method I : Expectations 

66-44 

6-68 

10-05 

9-56 

fl-77 

4-33 

2-75 

1-70 

1-11 

0-fiH 

0-41 

0-56 

1 IW-OOJ 

Method II: Expectations* 

67-07 

6-99 

_ 

10-12 

e-46 

6-51 

4-09 

2-56 

1-61 

0-99 

0-59 

0-35 

0-66 

1 ItKhfg.) j 

j ...j 


fi[ a 1-BS, fa = 6-3036. y* (method I) = 3-725t, xl (method II) = 4-031). Estimate of density, m( 1 + A) 
by method II: 1-50 plants per quadrat. 

♦ The first two frequencies estimated by method II should be exactly equal to the observed fre¬ 
quencies, but this would entail estimating m and A to a large number of decimal placet). Wo have takmi 
them to three significant figures. Calculation of frequencies using the estimated m anil A Urns pr 
a slight discrepancy. 

t The suffix attached to y z here indicates the number of degrees of freedom against which i(« value 
must be judged. Cells containing small expected frequencies were combined, ho that no group hint tut 
expectation of less than 6 units. 


Table 2. Distribution of Plantago maritima 


No. of plants 
per quadrat 

0 

1 

2 

3 

4 

5 

6 

7 

8 

.... 

$ 

10 | 

Observed number of 

12 

8 

9 

13 

6 

8 

11 

7 

K 

7 

} 

quadrats 









9 i 

Poisson: Expectations 

m 

3-24 

8-17 

13-76 

17-37 

17-54 

14-76 

10-65 

6-72 

3-77 

J j 

Method I: Expectations 


ESQ 


11-22 

10-81 

■umi 

H-80 

7-41 

0-02 

4-73 

^-69' 

Method 11 : Expectations 

m 

7-99 

U! 

12-10 

11-35 

10-14 

8-59 

6-95 

5-42 

•MW 

2-9»| 

No. of plants per 

11 

12 

13 

14 

15 





j 

20 i 

j 

quadrat 

10 

17 

18 

19 

and j 

Total j 

— 










<»V«P :j 

3 

Observed number of 

4 

1 

i 



1 



1 


loti j 

quadrats 





.. 


Poisson: Expectations 

0-88 

0'37 

0-14 

0-05 

0*02 

0-01 

0-45 

0-28 




! 

.. '! 

Method I: Expectations 
Method II: Expectations 

2-70 

2-15 

1-97 

1-48 

1-40 

1-01 

0-98 

0-67 

0-67 

0-44 

0-30 

0-18 

0-20 

0-11 

0-13 

0-07 

-• nilHhWti 
0-15-j- lOQ-UOl 
0-loji KXMW? 

„ 1 : 


r i- u-uo,g a = u- 3875 . Vi (method I) 
by method II; 4-38 plants per quadrat. 


.mil 4 As 












22 A generalization of Poisson’s binomial limit for use in ecology 

for the expectations obtained by either method, and we may conclude that the double Poisson 
series describes the material adequately. The simplePoisson distribution is clearly unsuitable. 
The parameters of the series as estimated by the two methods are given in Table 3. Thus for 
A. maritima the mean number of clusters per quadrat is estimated to be O 57 3 and the average 
number of plants per cluster 2-755, using the first method; but it is clear that if maximum 
likelihood estimates are obtained from the first two groups only, the difference is small. This 
difference is a little more marked for P. maritima —a point not to be wondered at when we con¬ 
sider the irregularity of the observed frequencies—but even so it is not of suffi cient magnitude 
to invalidate any estimates of the number of plants in a given area made by using method II. 


Table 3. Comparison of two methods of estimating parameters 



Armeria maritima 

Plantago maritima 


Method. I 

Method II 

Method I 

Method II 

m 

0-673 

0-662 

2-209 

2-121 

l+A 

2-755 

2-675 

2-286 

2-159 


6. The technique used to obtain estimates of m and A in method II, may be carried a stage 
further to give the large sample standard errors of these estimates, which are of use in 
deciding when this comparatively simple method may be used without introducing too great 
an error. To obtain these standard errors we form the second derivatives of L, where L is 
the quantity defined above, and then solve the equations 


/0 2 IA 1 1 

W i-pvr 


<?i 


d 2 L \ 
dm 3 A/ 


_1_P_ 



i i 

1-pM’ 


where <r m , <r A are respectively the standard errors of the estimates of m and 1 + A, and p is the 
correlation between them. On performing the differentiations and solving the equations so 
obtained, we reach the results 

» 1 /l —c-*»\ 

2 = m - e~ m e~ A p e~ A (1 - m) 2 

A IVm 2 e~ m e _A ’ ' 


where the parameters m and A on the right-hand sides of these equations have their true 
population values. These values will generally be unknown, so that sample estimates must be 
used. We then obtain 


_I , (1 -m) 2 ~l 

A NlnfN m 2 m'nJNj’ 


(7) 

( 8 ) 


where m - - log e (nJN). Clearly the standard errors are very large if either n 0 jN or nfN is 
very small, and the method breaks down if either w 0 or n t is zero. Tables 4-7 show the relation 
e ween n 0 /N and nfN and (a) m and 1 + A from equations (4); ( b) a n and <r A , for N = 100, 
from equations (5) and (6); (c) ajm and «r A /( 1 + A), for N = 100. 




Marjorie Thomas 

If the tables are entered with the observed values of n a jN and nfN, we obtain the estimates 
of m and 1 + A and of their standard errors. It is seen that the absolute values of tr m &nd tr x 
decrease as n 0 /N and nJN increase. The relative values of the standard errors, however, 
behave in a rather different way, decreasing to a minimum and then increasing again. 


Table 4. Values of m, cr m and ar 7 Jm for N = 100 


nJN 

0-05 

0-10 

- 

0*20 

0-30 

' 

0-40 

' 

0-60 

0-60 

0-70 

0 -Wl 

0 -M 


2-996 

2-303 

1-609 

1-204 

0-916 

0-093 

0-611 

0-367 

0-223 

0-108 


0-436 

0-300 

0-200 

0-163 

0-122 

0-100 

0-082 

0-066 

0-080 

0-033 

ajm 

0-146 

0-130 

0-124 

0-127 

0-134 

0-144 

0-160 

0-183 

0-224 

0-317 s 

i 

j 


Table 5. Values of 1 + A 


\nJN 

0-06 

m 



0-40 

0-60 

0-60 

0-70 

0-80 

0-90 | 

nJN\. 









..i 



2-527 


2-977 

2-992 

2-936 

2-814 

2-009 

2-272 

1-637 ' 

HRSJTH 

1-404 

1-834 

2-169 

2-284 

2-299 

2-243 

m 1 

1-918 

1-579 



_ 

1-141 

1-476 

1-691 

1-606 

1-650 

liii 

1-223 



■ 

— 1 

. ■ —— 

1-070 

1*186 

1-200 

1-144 

1-022 

. 


! 


Table 6. Values of <r A for N == 100 


gg 

0-06 

0-10 

0-20 

0-30 

0-40 

0-60 

0-60 

0-70 

0-80 


1 

0-536 

0-480 

ill 

0-441 

0-434 

9 

0-421 

0-410 

0-388 

0-317 ’ 


0-433 


HBMi 

0-307 

0-297 

ItiMJ 

0-277 

0-201 

0-225 


0-20 

— 

0-283 

0-231 

0-210 

0-190 

0-182 

0-184 

0 ' 134 


■ ■ s 

0-30 


— 

0-191 

0-166 

0-147 

0-128 

0-101 



i 


Table 7. Values of oq/(l + A) for N = 100 


\ n„/N 

nJN \^ 

0-05 

0-10 

0*20 

0-30 

— 

0-40 

— 

0-60 

0-60 

— 

0-70 

0-80 

! 

0-90 

0-05 

0-10 

0-20 

0-30 

0-266 

0-309 

0-190 

0-197 

0-248 

0-158 

0-148 

0-156 

0-179 

0-148 

0-134 

0-132 

0-140 

0-146 

0-129 

0-122 

0-123 

0-146 

0-120 

0-117 

0-112 

0-149 

0-131 

0-116 

0-099 

0-167 

0-136 

OHIO 

0-171 

0-143 

0*193 


It will be noted that values of 14- A and cq are given for a somewhat curtailed range of 
’h/A . The reason for this is that as n 1 jN increases for fixed nJN, the equations (4) yield 
successively smaller and eventually negative values of A. As a limiting case we have the 
situation where the two ratios are exactly equal to the first two terms of a simple Poisson 
series, that is, a special case of the double Poisson with A = 0. Although it is possible that the 
double Poisson senes, with A taking negative values, may provide a 'graduation' for 




















24 A generalization of Poisson's binomial limit for use in ecology 

observational data, the parameters in this case lose their physical significance, and the 
situation has therefore not been further considered here. If the true population A is zero or 
has a small positive value, we may expect that maximum likelihood estimates of A will 
sometimes be negative, owing to sampling fluctuations. It follows that estimates of A which 
are near to zero should be treated with caution, even though the standard errors of such 
estimates may not be large. 

1. We have still to compare the relative accuracy of methods I and II in the estimation of 
the population mean, since this is the quantity with which the ecologist is primarily con¬ 
cerned. By method I the mean of the sample is taken as estimate of the population mean, 
with a standard error where fi 2 is given by equation (3). 

Using the maximum likelihood method, the simplest way of estimating the population 
mean is to estimate m and 1 + A from equations (4), and then to form the product M — m(l + A). 
The large sample standard error of M is then given by 

cr\j = mM -f (1 + A) 2 cr 8 m +2m(H- A) per w <r x . 

On substituting values already found in the right-hand side of this equation, we reach the 


result 


, 1 r(wi-A-2) 2 

7V/T =5 1 - — 


<*M 


N L 


As before, this may be estimated by 

, 1 r (wt — A — 2) a 

4 -]fL- 


njt f 


+ 



(9) 


(10) 


where for m and A we substitute the estimates obtained from equations ( 4 ). 

(a) Table 8 shows values of M and (6) Table 9 values of the standard error of the maximum 
likelihood estimate of M for N = 100* (from equation (9)), in terms of the population pro¬ 
portions in the first two groups. These tables may also be used to give estimates of M and cr M 
based on the observed frequencies n 0 and n v 


Table 8. Values of the mean M 


v . 1 

\ n 0 /A 

n,/W\ 

0-05 

0-10 

0-20 

0-30 

0-40 

0-60 

0-60 

0-70 

0-80 

0-90 

0-06 

0-10 

0-30 

6-283 

4-206 

6-819 

4-223 

2-621 


3-585 

2-760 

V -916 

1-427 

2-741 

2-106 

1-471 

1-099 

2-035 

1-554 

1-074 

0-793 

y 

0-931 

0-683 

0-436 

1 

B 

Table 9 . Standard error 

Rm — A — 2 ) a m 2 

+ 2) 2 ]i 

or N = 100 

(_ e~ m mr '" r A ' 


0-06 

0-10 

0-20 

0-30 

0-40 

0-50 

0-60 

0-70 

0-80 

0-90 

0-05 
0-10 
()• 20 
0-30 

1-304 

0-964 

. 

1-042 

0-691 

0-471 

0-789 

0-529 

0-325 

0-233 

0-623 

0-426 

0-264 

0-181 

0-495 

0-342 

0-213 

0-144 

l 

If 

0-215 

0-146 

0-082 

0-138 

0-089 

0-062 


* The standard error for a sample of size N{ is obtained by multiplying by 10/^'Aq. 

















Marjorie Thomas m 

Table 10 shows the ratio of (i) standard error of estimate M - m{ 1 + A) based on the mean 
of the complete sample, to (ii) standard error of maximum likelihood estimate, that is, of 
Jfi 2 from equation (3) to a M <JN from equation (9), This ratio is independent of'A, but it is, of 
course, a large sample limit. The figures in this table indicate what loss in accuracy must be 
faced for the sake of gain in speed resulting from a count of only the first two frequencies, 
n 0 and %. Tor instance, for the standard error of the more approximate method to be no 
more than about f times that of the more accurate, the distribution should he such that at 
least 60 % of the quadrats contain one or no plants. The table also shows the relative number 
of counts needed for a given accuracy; thus, for example, if the expected proportions are 
nJN = 0-20, nJN = 030, then (l/0-60) a = 2-8 times as many quadrats must be counted 
using method II as using method I for roughly the same accuracy in the estimate of »»( 1 + A). 

It is for the ecologist to decide on the balance to strike between accuracy and field labour. 


Table 10. Ratio of standard errors of estimates of M, (i)/(u): 
(i) from moment solution, (ii) from maximum likelihood solution 


\yN 

0*05 

0-10 

0-20 

0-30 

0-40 

0*50 

0-80 

0-70 

0-80 

0-90 

0-05 

0-311 

0-410 

0-510 

0-580 

0-640 

0-695 

0-740 

0-805 

(1-887 

0-947 

0-10 

0-280 

0-450 

0-581 

0-657 

0-718 

0-772 

0-824 

0-877 

0-935 


0-20 

— 

0-387 

0-636 

0-730 

0-801 

0-855 

0-908 

0-958 

.... 


0-30 

*- 


0-801 

0-760 

0-850 

0-9U 

0-903 





8. We may conclude that the double Poisson series may prove useful for the description of 
plant distributions. We have attempted so to design the mathematical sot-up that the para¬ 
meters of the derived distribution may be capable of physical interpretation. Sufficient data 
are not available to test whether the parameters m and 1 + A do, in fact, measure the average 
number of clusters per quadrat and the average number of plants per cluster, respectively, 
but the good fit obtained for the two series discussed above suggests that for these series, at 
least, the mathematical model provides an excellent graduation, Further, if this scries can be 
used in other cases where the maximum likelihood method of estimation is satisfactory , t he 
labour of the field ecologist will be considerably lightened in that a complete enumeration of 
plant numbers need not be carried out. A knowledge of the zero and unit classes is all that is 
required. The tables provided should be helpful, both in estimating the parameters arid in 
deciding which method of estimation should be used. 

I wish to thank Prof. E, S, Pearson and Dr P. N, David for suggestions and help in 
the preparation of this paper, and Miss E. E. A. Archibald for supplying mu with date 
for the numerical illustrations. 


REFERENCES 

Archibald, E. E. A. (1948). Ann.-Bot., Lord,, N.S. 12, 221. 
Fisher, R. A. (1922). PHba. Trans, A, 222, 309. 

Neyman, J. (1939). Ann. Math. Statist. 10, 35 
Tippett, L. H. C. (1932). Proa, Roy. Soc. A, 137, 434. 



[ 26 ] 


THE ESTIMATION AND COMPARISON OF RESIDUAL REGRESSIONS 
WHERE THERE ARE TWO OR MORE RELATED SETS 
OF OBSERVATIONS 

By A. H. CARTER, King's College, Cambridge 
1. Introduction 

The problem to be investigated concerns a series of parallel samples each comprising observa¬ 
tions in two or more variates, the corresponding members of the different samples having 
certain elements in common. Such a situation would occur in the case of (a) successive sets of 
measurements on the same animals or plants or experimental plots, or ( b ) varietal trials where 
a complete experiment (involving each of the varieties to be tested) is repeated in a number of 
districts. The essential feature is that the different samples comprise observations relating to 
the same underlying material, be it animals, or plants, or plots. It is desired to derive esti¬ 
mates of the residual regression effects where the effect on one measure of a number of others 
is being studied, and to develop tests of their homogeneity as between samples. Since the 
samples are non-independent, the usual tests of homogeneity of regressions among inde¬ 
pendent samples (Bartlett, 1934; Welch, 1935), will be no longer applicable. Further, in so far 
as it is desired to ascertain and compare the net regression effects, after elimination of the 
effect of the underlying common elements, the estimates of the regression coefficients them¬ 
selves will differ from those generally computed. 

As an example of the type of problem to be considered, suppose we wished to compare the 
regression of sugar content on yield for several varieties of sugar beet. Replicated plots might 
be set up in each of a number of different localities, with one variety to each locality. Regres¬ 
sion coefficients for each variety could then be tested for homogeneity by the usual methods, 
the samples being independent. Obviously, however, any conclusions drawn from such an 
experiment would be of little value, since varietal and regional differences would be con¬ 
founded, Suppose, instead, neighbouring plots in each district were allocated to the several 
varieties, one or more plots per variety, theexpeTiment being repeated in a number of dis tricts. 
If region-variety interaction could be assumed negligible, inherent differences in soil and 
climate between localities would affect each variety to the same extent, Any conclusions 
now drawn would clearly be of general application. On the other hand, since every district 
contributes one or more pairs of observations to each varietal sample, the samples will be no 
longer independent, and the standard methods will fail. 

The problem of testing for significance the difference between the regression coefficients 
from two correlated samples was first investigated by Yates (1939). On the assumption that 
the samples came from two populations in which the residual of the dependent variate (after 
allowing for the effects of the regression) was normally distributed, he demonstrated that the 
difference between two simple or partial regression coefficients was itself normally distributed 
with variance expressible in terms of the residual variances and covariance of the two popula¬ 
tions, and showed how these quantities might be estimated from the data. Since, however, 
the estimate of the variance of the difference between the regression coefficients was based on 
an unassigned number of degrees of freedom, the resulting test was only an approximate one; 



A. H. Carter m4 

its results must be interpreted with care in the case of small samples. The teats developed 
below are exact tests, and, moreover, are applicable to the case of any number of samples. 

In the course of an experiment by the Wool Industries Research Association (Galpin, 1941}, 
Daniels applied the method of fitting constants to the present problem. In the experiment, 
measurements of two external characteristics of a number of sheep were recorded at three* 
monthly intervals. The problem was to determine whether the relationship between the two 
variates changed with the season. 

The method employed below to derive the estimates and significance testa in the general 
case is also that of fitting constants. This technique, of wide application to statistical 
problems, is particularly useful in those cases where the magnitudes of certain effects require 
to be estimated, and the effects tested for significance. In all cases where it is employed, the 
method is of value in indicating clearly the basic assumptions which have been made. The 
observed values are presumed drawn from a population whose form—the ‘model’-—-is 
specified, depending on a number of unknown parameters. The constants to be fitted are the 
estimates of these parameters. Where the residual variation in the population is assumed 
normal, as will be the case throughout this paper, the fitting of constants by least squares will, 
of course, yield the maximum likelihood estimates. 

The general theoretical problem is first formulated, and a general result forming the basis 
of the tests of homogeneity developed later is considered. We proceed to derive specific 
formulae, in convenient form for calculation, for the general case of p correlated samples 
with q independent variates. The special cases of p = 2 and of q = 1 are then discussed, and 
the known results for independent samples briefly deduced and compared. After a discumion 
of the underlying assumptions made, and a comparison of the method with that proposed by 
Yates, the paper concludes with two examples by way of illustration. 


2. Discussion op method 

The general case to he considered is that of p samples or ‘ lots each of n observations in the 
( 5 + 1 ) variates y,x v x 2 ,...,x q . Denote the jth set of observations in the ith sample by 
(Vip x mi) (« = 1,2, i = 1,2, j = l, 2, The np sets may bo considered aa an 
nxp array, the p columns corresponding to the samples. These are correlated owing, it in 
supposed, to a common effect—the 1 correlation-effect ’—running through their jth members, 
i.e. through the jth row in the array considered 

We regard thep Tots ’ as samples from p related subpopulations in which the values of 
are assumed specified by 

a 

Vi) = S + y* + af+p* + e,j, (2- J) 

u-l 

where ft u{ (u = 1 , 2 , ...,q) measures the true regression effects in the ith subpopulation, 

y? measures the Tot’ effect (column-effect) common to the members of the ith sub- 
population, 

a * mea9ures the ‘correlation-effect’ (row-effect) common to the jth members of all 
subpopulations, 

p* denotes the general (population) mean, 

and e is is a random residual, normaUy distributed about zero mean with constant (un¬ 
known) variance cr 2 . 



28 The estimation and comparison of residual regressions 

In the sugar-beet experiment cited above, for example, the different varieties constitute 
the ‘lots’, the column-effect would be that due to variety, and the row-effect that due to 
locality. 

It is required (i) to obtain estimates b ui of the partial regression coefficients ft ui) (ii) to test 

hypotheses of the type ft ui = /?„.(» = 1,2. p), i.e. to test the homogeneity as between 

samples of the partial regression coefficients, and (iii) to derive a test of the significance of the 
difference between any two regression estimates, say those of y on x u for the Ith and mth 
samples, the hypothesis to be tested being 

Ait — 

In view of the assumed relation (2-1), the tests under (ii) and (iii) above are particular oases 
of the test of a general linear hypothesis (Kolodziejczyk, 1935). In the present case, the likeli¬ 
hood approach yields a test criterion depending on the quantities S a and where S a is the 
absolute minimum (i.e. for variations in all the parameters of (2*1)), and S r is the relative 
minimum (i.e. taking account of the conditions implied by the hypothesis under test) of the 
sum of squares p n 

2 S (*«-*#>*. 

i-i 

where y M = •%#) = 2 £**■#+7?+ «?+/**• ( 2 ‘ 2 ) 

It will be found convenient, however, to adopt a slightly different approach, and to regard 
the problem entirely as a formal multiple regression one. For this,' dummy ’ variates will be 
introduced, as explained below, to carry the parameters yf, a*, ji* in (2-2). (For the use of 
dummy variates in tins connexion see, for example, Bartlett (1933) and Yates (1933).) 
Using this approach, all our required tests of homogeneity are particular cases of the following 
general result. 

To test the homogeneity of a group of the partial regression coefficients in a single sample. 

Suppose the regression of y on t independent variates x v x % , ..., x t is linear so that the 
expected value y of y is 

V = / ? i*i+As*a+- ..+P r x r +/? r+ 1 » r+ i+...+M (2-3) 

(the variates being measured from their means), (y — y) is then assumed normally distributed 
about zero mean, with unknown variance tr a . It is required to test, on the basis of a sample of 
size n, the hypothesis, H 0 , that /?i = = ■ ■ . = A = A» say. This is clearly a linear hypothesis 

of order (r- 1), and the appropriate test follows from Kolodziejczyk’s general theorem. 

Denote the jth set of observations by y } , x lf , x 2j> ...,x tj (j = 1,2, so that, the *’s 
being assumed errorless, 

E (Vj) = ^i = A*« + As% + +Pt*»- 

Write 2 for S {Vi-Vf)*. Bet S a be the absolute minimum of 2 for variations in all t para¬ 
meters (the /?’s), and let S r he the relative minimum of 2 under the conditions of if 0 , i.e. of 
2' say. 

Then, provided B 0 holds, the two members on the right-hand side of the identity 

S r sS a + {8 r -8 a ) (2-4) 

are distributed independently as yV 2 with (n-t) and (r-1) degrees of freedom respectively. 



A. H. Carter 


29 


The hypothesis may therefore be tested by taking the ratio of the mean squares 

(S r — S a )l(r— 1) and SJ(n-t), 

which will have Fisher’s variance-ratio distribution for (r - 1) and (n - 1) degrees of freedom» 
if the hypothesis is correct. 

In particular, S at the minimum of S for variations is given by 

S a = 2 (y j -b 1 x lj -...-b t x lj ) 2 , 
i-i 

where the b’s are the solutions of the normal equations 

S x kJ (y j -b l x 1J - ...~b t x ti ) = ° (fc = 1,2 
)~1 

That is, S a is the residual sum of squares for the sample regression of y on x v x it x t . 
Similarly, S r is the minimum of 

n 

S' = S {Vj~ fio x l}~fio x 2l ~ ■■■ ~ fio x r]~ ftr+l x (r+T)j'~ — Pt x tj> 

1-1 

n 

= 2 (yj~ PoXy-Pr+l x <r+tii- 

j** 1 

r ^ 

where % = 2 for variations in /5 0 ,/? r+1 , That is, S r is the residual sum of squares 

w= 1 

for the sample regression of y on x Q , x r+1 , ...,x t . 

With these interpretations of the quantities S a and S n the identity (2-4) yields a ready 
practical test, in analysis of variance form, of the hypothesis H 0 , i.o. of the homogeneity of 
b x ,b 2 ,...,b r . The estimate s a of cr 2 is, by the usual methods of regression analysis, &lj(n ~t), 
As a corollary to the foregoing general result, we may derive a test of significance of the 
departure from zero of a group of the partial regression coefficients. The hypothesis, say, is 

that/?! = /? a = ...= /? r = 0 (i.e. that /?„ = 0 in fl 0 ), a linear hypothesis of order r. 8 a in as before; 
we require the relative minimum S' r , say, of S' when /? 0 = 0. S' r is, in fact, the residual sum of 
squares for the sample regression of y on x r+v x r+i , 


(2*5) 


On the hypothesis H' 0> the two members on the right-hand side of the identity 

S' r =S a + (S‘-S a ) 

are distributed independently as y 2 cr 2 with ( n—t ) and r degrees of freedom. 

Hence, as before, the hypothesis may be tested by referring the ratio of the mean squares to 
the variance-ratio distribution. 

It may he noted at this stage that equation (2-2) may, without affecting the /?’«, be written 
in the form ? 

Va = ZJuiXuii + Yi + Vi, ( 2 * 6 ) 


where 


and 


Uw 1 

r< = y*~r* (»= 1,2, ...,p), 

aj = a} + y*+/i* (j = 1,2, ...,n) 


Since ^2 Yi 0, there remain in addition to the /?’a only {n+p — 1) independent constants 
to be fitted. 



30 


The estimation and comparison of residual regressions 


3. General case of p correlated samples with q independent variates 
As before, denoting the observed values by 

(yi]> x «ij){ u = M. •••><?; 4 '= j= 1> 2 >— ’ w )> 

the y {j are assumed distributed according to the specification (2'1). In virtue of (2-6), this 
may be written 9 

t/ij = It Pui x uii + yi+ a j +e ii- ( 3-1 ) 


U~l 


Now 


Vi) 2 4- Ct. 


u=l 


5 jj 2>~ 1 n 

— X X A(fc 2 fp(u~l)+fc)H + X yi^p+Dij-b X a r z (8p+p+r-l)ij> 

«=1 fc=1 1 = 1 r-1 


( 3 - 2 ) 


where the z’s are ‘dummy’ variates taking the specified values 

(i = l), 


z {p(u~l)+k!iJ 


= x uk} (i = fc), 

= 0 (i4=&), 


Z (qp+l)i] 


z (qp+p ^-r 


= 0 


1 (i = r), 


= 0 (i 4= l), 

l = -1 (i = 2 >), 

(u = 1,2, ...,g; i, A; = 1,2, ...,p; l = 1,2,...,jp — 1; r = 1,2, 

That (3-2) is equivalent to (2-6) may readily be verified by inserting any particular values 
of i,j. 

The least squares estimates b ui , c { , a jt of the constants in (3-1) are obtained by minimizing 
2E(yij-Vij) 2 -t From (3-2), it follows that b n , b n ,..b qp , c u . c p _ 1 , a v ...,a n are in fact the 

estimates of the partial regression coefficients of y on the (qp+p + n- 1) independent variates 

* 1 . .V 2 gp+1 ,.. •, z w+p - 1 , Z cp+p ,.... z 5P+ p+n-i respectively, there being in all 

jm(i= 1 , 2 ,.. .,p;j = 1 , 2 , 

observations in eacn variate. In terms of ordinary multiple regression analysis there will be 
(qp+p+n- 1) normal equations which we require to solve for our (qp+p + n— 1) estimates. 
We now define the sample quantities 


(u,v - 1/2, i,k = 1,2 ,...,p), 


Ttiyik Ti( x uii X ui ) ( X vkj x vk.)> ^uvik ^j^iivik 

Quik — X^uij ~ ®Mt.) (2/fei ~ Vk.)> Quik = Quifc 

= X(2/lj _ Vi.) (Vkj ~ Vk.) > R'ik ~ ($ik ~ ^ik 

= ’ ant * ^ ot ™ subscript denotes the mean, e.g. x uii = ~ X x my 

Eliminating the c, and from the normal equations, we obtain after some reduction the 
set of equations for determining the 6 ui : 

^ Qvik ( v ~ li 2 !---,?; i - l,2,...,p) (3-3) 


or 


X 2 b uk Puvki — X Q'vik- 

fc=l fc=1 

t for simplicity, S, 2 will be written for £ , £ throughout. 
( i i=l j~i 


(3-3') 



A. H. Carter 


31 


In the notation of §2, S a is, in the present case, equal to 22(2/y _ *y) 2 > where Y {j ia the 

i i 

sample estimate of tj i]t i.e, 

Ty = Tt b ui x u y + Ct+cij. 


14=1 


After reduction, and using the relations (3-3), we obtain 

22Gto-V = ski- 2 s KiQuik) 

ij i \ u=l / P i,fc“l\ w=l ' 

= ski- 22 MrfJukjSS k*“ sV«<^ P “4 {3 ' 4) 

or ss(y«-v- ss k; fc - s &„<<&») 

i i i, Jfc=l \ u=l / 

- 22 k«- 22 ■ 

We note in passing that, since /Sf a is in this case based on 

{np-(qp + p + n- 1)} = {(n- l)(p- 1 )~h} 
degrees of freedom, the estimate of the residual variance cr a is 

« 3 = 22(*/y- Y {j YI{(n-l)(p-l)-pq}. 

i j 


(3*4') 


(3*5) 


It is required to test the homogeneity as between samples of the respective partial re¬ 
gression coefficients of y on x v x i} ..., x q . Consider those on aq, the hypothesis, H x my , to be 
tested then being that /9 1X = /? ia ==,..= /? lp = /}[ say. For the multiple regression of y cm the 

i> 

(qp + n ) independent variates z', z p+1 , z p+2 ,..., z Qp+p+n _ v whore z' {j - 2 z klj sx U} , wo have 

k'* 1 

for the expected value rfy of y ij} 


Vii ~ Pi. z ij+ 2 2 ^nfc2{p(«-i)+*]y+ 2 7/ 2 ( w +i)y+ 2 “Xp+j+r-ily 

14= a *=1 {s>l T“1 


Suppose the sample estimates of fj' uk , y\, a' and are respectively b' uk , c\, «' and 

The normal equations, after eliminating the c\ and a! n then reduce to 


6 "?( Puw “i h Fllki ) + ? A J x = s( 



(3-0) 




(u= 2,3,. 

* ■* i,2,...,p),J 

or 22 

t, /e»l 

( b i.Puki+ 2 KkP'uiki) 
\ u=2 / 

= 2 2 
if **=1 



p 


V 


| (3*6)' 

S i 

k=X 

2 b' uk P' uvki ) 

\ u = 2 / 

= 2 (v = 2,3.< 

fc=i 

= i, 2 ,f>). 


The regression estimates &L Kt {u = 2,3,..., i 
these {p(g -1)4-1} simultaneous equations. 


1,2,.. ,,p) are obtained as the solutions of 



32 The estimation and comparison of residual regressions 

The residual sum of squares for this regression is SS(2/-;j — T#) s > corresponding to the S T of 
(2-4). Proceeding as before we obtain 1 j 

SS {Vij~ T'ij) 2 = S S {^T ik — b 1 Quit.— S h ni Q ui j\ 

< j i,k =1 \ «=2 / 

- ss ss (3-7) 

-i, *=1 \ «=2 u,«=2 / 

If the hypothesis H x is true, this sum yields an estimate of <r 2 based on 

{np - (M+«)} = i n (P ~ 1 )-P?} 

degrees of freedom. To test the homogeneity of the b u (i = 1,2, ...,p), therefore, we consider 
the identity corresponding to (2-4): 

SS(y«“ r w )*-(ss(y«-+ vl* (3.8) 

i j it j I y i i i i i 

whose members have [n(p — 1)— pq}, {(n~l)(p—l)—pq} and {p — 1} degrees of freedom 
respectively. The ratio of the two mean squares for the right-hand members may be referred 
to the variance-ratio distribution, as in the case of a simple analysis of variance. 

In similar manner we may test separately the homogeneity of b iit b 3i , ...,b fli {i = 1,2,.,., p) . 
Suppose now it is desired to test the overall homogeneity of all the 

K>K . Ki {* = i> 2 > •••>#) 

together, that is to test the hypothesis, say, that 

Pui = Pu. say («=1.2. q\ i = 1 , 2 ,..., 3 >). 

This is, of course, equivalent to testing the homogeneity of the multiple regressions in the p 
correlated samples. We require the residual sum of squares for the multiple regression of y on 
the (q + p 4- n -1) independent variates z[,z 3 ,...,z' q , z vp+x ,..., z qp+p+n ^, where 
, v 

%uij = Ti 2(p(u—l)+k )lj — % u iJ { u = f>2, 

Proceeding as before, the equations for determining the estimates b u of fl Ut are 

S i k{Pu V u-\ £ P w h) = (V = 1.2. q), (3-9) 

i «=1 \ V k-\ 1 i \ Pk°=l J 

or 22 £ h.K V M= 22 Q' vik (f = l,2. q). (3-9') 

i, fc=lu=l {, 1 

Denoting the sample estimate of the expected value of y i} for this regression by T"j, the 
residual sum of squares is 

S nvts-V = 22 U' ik - £ = SS (s'ik- 2?S K,b v P ' mik ). (3-10) 

» * 1. fc“l \ «=1 / i, fc=l \ u, 1>=1 / 

On the hypothesis J? a , this sum provides an estimate of cr 2 based on 
{»p - (q+p + n - 1)} = {(» -1) (p -1) - q} 

degrees of freedom. The appropriate identity, corresponding to (2-4), for testing the hypo¬ 
thesis H 2 is j \ i ) 

3 V + SS(y«-r«)*-SE(y«-W. (3-n) 

* * 1 > ! I ( i } i j ) 

with {(a - 1) {p— 1) — g}, {(n — l)(p — 1)— pq} and {q(p~ 1)} degrees of freedom respectively. 

On substitution from (3’ 10) and (3’4'), and simplification of the last member, the identity 
may be written 

(«•- SS , KKrJ\~ fs (*;»- SS MA.) 

i, fc=» 1 w, ti«l 


(3-12) 



33 


A. H. Carter 


The formulae developed for the cases H 1 and above provide tests of the homogeneity of 
the partial regression coefficients in correlated samples. As particular cases, we may wish to 
determine whether the partial regression of y on a particular variate, aq say, or the multiple 
regression of y on all variates, is significant at all. The. corresponding nul-hypothesea are H lt 
that /? 14 «0 (i = 1,2,..., p) and E‘ v that /? tt£ s 0 (« = 1,2,..., q\ i - 1,2,..., p), and the tests 
follow from the corollary to the general result of §2. 

Referring to the regression equation (3-2), it is easily shown that the residual sum of 
squares for the regression of y on z p .,. v z „ +a ,.... z qpi is 

SStetf-*»«>* = SS (R' ik - E CQ'uik)- EE (%k- EE bitK«n*iX (3-13) 

ij i,fc=l \ u=2 / i,k = 1 \ Tt, i*™2 ! 


where 


E S Kk p uvki - S Qnk (v — 2, 3 ,..., q\ i — l,2,..,,p), 


(3-14) 


u—2 k-\ 


k~ 1 


IS 


(3'15) 


and the residual sum of squares for the regression of y on z qj) + v z q p+i ,,..., z qjm , 

ee (?/«- y» )w ) 4 = ee iva - vi -?/.,■+ V. ,) 4 - ex n\ k . 

I j i ) i, !: ~ 1 

Hence the appropriate identities corresponding to (2-5) are 

for Hi: SE (y„~ Y' m) ,.,.) 2 = EE (y«-r„) 2 + [EE (?/, 7 - *«> w ) a - EE - v). (3-1«) 

with {n~])(p— 1 )~p(q— 1), (a - 1) ( 2 ) — 1) —pq and p degrees of freedom respeel ively, 


v..vA 


CM' 


for XX(</q- Yt', )!j )**ZZ(y ij ~Y ij )*+ 2X0/«- V~EE('/ l7 ~ V 

if if 1 £ / i j I 

or EE Z4- sx (h;,- se*,^,,h; u ,,)+( ee ee j. 

with («- 1) ( p - 1), (n-l) (p — 1 )-pq and pq degrees of freedom respectively. 

The relevant tests of significance of the regressions and of their homogeneity may he 
combined in an analysis of variance table. Thus, for testing the hypotheses 7/ g and //l, wo have; 


Source of variation 

Degrees of 
freedom 

Hum of wpmrcs 

Mi'im j 
sqimru | 

Duo to pooled regressions 

Differences between separate and 

</ 

n , ;i 

.X E EE b„.l'r. i'uiii 

», k l ii, r t 

V ij 

... .j 

| 

» 1 

pooled regressions 

1! 

EE EE t 

i, k-l v,r»l 

.i 

.'i 1 

j 

Due to separate regressions 

VI 

P , V 

' I 

4 j 

Deviations from soparat -0 regressions 

(n- 1) (p- 1) —pq 

KA k nJ. 

£ u.l,. i J 

.«» i 

i 

Total 

(n-l) (p-1) 

S P E HA 

i, 


Biometrika jo 


—- 




34 The estimation and comparison of residual regressions 

To test the significance of the multiple regressions (cf. (3T7)) we test the variance-ratio 
sljs 1 , with pq and (n~l)(p-l)-pq degrees of freedom. To test the homogeneity of the 
multiple regressions, the appropriate variance-ratio (see (3-12)) is sf/s 2 , with q(p — i) and 
(w — 1 ) {p — 1 ) - pq degrees of freedom. 

There remains the problem of estimating the separate variances and covariances of the 
individual partial regression coefficients, which are required in testing the significance of any 
particular one (from an assigned value) or of the difference between any two. Regarding 
b xti , Cj, cij, as the estimates of the partial regression coefficients in the regression equation (3-2), 
we obtain in the usual manner: 

estimated variance of b ui = 1 . , 9 ■ 9 _v 

estimated covariance of b, d b„ k = ( 3 .^ 

where s 2 is given by ( 3 - 6 ) and (g K /t ) (A, /i = 1 , 2 ,..., qp + p + n - 1 ) is the inverse of the (sym¬ 
metrical) matrix (f K/ ) of coefficients of b ni , c h a j in the normal equations derived from (3-2). 
The determinant |/ A _ /; j = F say, reduces to 

F = n p-1 p n+1 Z), 

where D = j < 2 p( u _^pCo—i)| and “ Tuviif 

The eofactor of frfu-p+i, pfo-i)+k i s similarly found to be 

T p (,u~i)+i, jo(v— 1 )+& ~ Dp[jt~\)+i, j?(i)-n+fc> 

Hence 9 v { u ~i)u. j,(t-n+fc = &<■“-»+''• p(v-D+fe j 

or we may write (9 P (u~i)+i, ^-d+a) = (Kvik)~ y - (319) 

It is known that, under the assumptions of normality and no error in the x’s, the b ui are 
normally distributed (for each b ui may be expressed as a linear function of a number of 
independent normal variates) about means /? ui . Hence given the estimates of their variances 
and covariances (3-18), any coefficient, or the difference between any two, may be tested 
for significance by ‘Student’s’ 2 -test. This enables us to test the significance (from zero, say) 
of any b ui , of the difference (b u{ - b uk ) (k =f i), or of the difference (b ui — b v{ ) (v + u). 

4, Special cases 

4*1. Single independent variate (q = 1 ) 

Denote the observed values by (y ip x {j ) (i = 1 , 2 , j = 1 ,2and write 


(i,fc= 1,2 


J 

l( 

a. 

{s ik -~jP ik 

Qik ~ £ {x {j - *i. ) {y k j -y k .)> 

i 

<&- 

Qik 

R ik = 'Z{y ij -y i .){y kj ~y k ), 

J 


' ( S <k~-p) R ik 

is, as before, the Kronecker delta, 

= 1 (k = i) 

= 0 (fc*i) * 


The required formulae follow immediately from the results of the previous section, 
references to which are made in the left-hand margin. 

The equations for determining the b, (i = l,2,...,p), and the estimate s- of the residual 
variance are: 

(3 ' 3) s Qu-p- 1 s Q ik (<« 1,2,....p) 

r k =l *=i 


(4-11) 



A. H. Carter 


3/5 


or 


£ hP'i*= £ a*, 

k= 1 k =1 


SS(y«-^) 2 

i i 


(3-3') 

(3-4') 

(3-5) 

To test the homogeneity of the b it i.e. the hypothesis that =... = 

require the estimate 5 of /?, and the residual sum of squares 22 (.Vo - ?«)*; 


= 22 

l, fc^l 

= 232 lR\ k -<hl> k P' ik ), 

i, fc-=l 




i / 




(3-6) 

(3-6') 

(3-7) 




or 


5 22 i 3 ;* 

SS(y«-W 

i j 


= 22 «?}*, 

t, k'-- 1 

= 22 (i?;- fc -5%.) 

i* fc«l 

= 2 2 (jBJ fc -S»PJ fc ). 

i, *~1 


(4-m 

{4-12} 
{4-13} 
/? say. wa 

(4-14) 

(4-14*) 


To test the homogeneity of the b it we employ tho identity, corresponding to (2-4) 

(3-12) 2 P 2 (R'ik-^P' ik )= 22 (R’i k ~!hb k P'ik)+ 22 {b t ~tnb k -li) 1% 

i t k=l i, i, k • 1 

withn(p--l)-p, (w-l)(p- l)-p andp-1 degrees of freedom respectively. 

To test whether the regression is in fact significant, tho appropriate identity is 


(3-17) 


22 R’i k = 22 (R' ik ~b i b k P' ik )+ 22 W’u-, 

i, x, fc=l <» 


(4-13) 


(4-HI) 


(4-17) 


with (n — 1) (p— 1), (n — 1) (p — 1) — p and p degrees of freedom. 
The two tests may be combined in analysis of variance form: 


Source of variation 

Degrees of 
freedom 

! 

•Sum of squares 

Mraui ■; 
Htpnw j 

i 1 

Duo to pooled regression 

Differences between separate 

1 

b 1 2?2 /’« 

i>k l 

1 

| 

regressions and pooled regression 

P ~1 - 

{h { — b) {h^ — h) p0. 

i, k 1 

•' ! 

Due to separate regressions 
Deviations from separate 

V 

b,M’« 

4 ! 

regressions 

(n-1) (jj-l)-p 

SB (iQ-b^r,,) 

i , /c*«X 


Total 

(n-l)(p-l) 

li 2 It,* 

! 




36 The estimation and comparison of residual regressions 

To test the significance of the regression, and the homogeneity of the separate regressions, we 
test the variance ratios s\js 2 and sfjs 2 (with degrees of freedom as indicated) respectively. 
Finally, we have: 

estimated variance of b i — s 2 g 

estimated covariance of 6 { and b k = s 2 g {k ) 

where the matrix (g ik ) is given by 

(3-19) (g ik )={P'ik)- 1 - . (4-19) 


(*,fc= 1,2 


(4-18) 


4-2. The case of two samples (p = 2) with q independent variates 
Some simplification of the formulae is possible when p = 2. We note that 

Khh - *( -1 Y +k P» vi *, Q'uik = i(-i) i+fc Quik, R'ik = k(~ i) i+fc Rik (i, = 1.2) 

From the relevant equations of § 3 we derive the following results: 

(3-3') S Si(-1 ) i+k b uk P um = S*(-l ) i+k Q v{k , 

«=1 fc=l A=1 

i.e. E S(-l ) k b uk P mki = E (-1 ) k Q vik (»-1,2,=1,2), (4-21) 

«=i a=i /;=i 

(n-2g-l)s 2 = SS(j/«-^) 2 = i ES (-l) <+fc k fc - E 6 ui <2,J 

i i i, *= 1 V «= 1 / 

= i EE (-l) i+fc (iJ lfc - EE MiAi*)> (4*22) 

1, fc=l \ 11,11=1 / 

EE (-W&i.-W E = EE (-i)* + *Gn fc , 

.fill <,*=1 \ 11=2 / tfc=l 


(3-6') 


S (-I) fc (6iP lrtt + E S (-!)*&,* (v = 2,3 ,i = l, 2 ), 

* = 1 \ « = 2 / A-l 


(4-23) 


(3-7) 


ES (Vij 

* i 

2 / 

-ly) a =4SE (-1 )**( 

i. k=l \ 


= 4 ES (-i)*+^f 

i, fc=l \ 

(3-9') 

ES (-1V + ^ U P„„ W = 

i,k= 1 

(3*10) 

EE (da- Y'ljf 

t j 


(4-24) 

(4-25) 


= 4 EE (-i) i+fc (p ifc - EE b u b v P„ . (4-26) 

i, <i=i \ 11 , 11=1 i 

To test the homogeneity of the b ki (i = 1,2), i.e. to test the significance of the difference 
I b n 'M> we employ the identity (3-8) where now EE {y ti - Y'af and ES (Vij - Yu) 2 axe 

given by (4-24) and (4-22); the degrees of freedom being n-2q, n -2 q- 1 and 1 respectively. 
To test the overall homogeneity of all the b lit b 2i , ...,b Qi (i = 1,2), which is equivalent to 



A. H. Carter ® ' 

testing jointly the significance of the differences (b ul -b u2 ) (u = 1,2,the appropriate 
identity corresponding to (2-4) is 

* SS(-i) i+fc K- SS (-W*,*- 2 £ 

0 v {,*- 1 \ «.»*! / <•*“! ' 

(o*l-oj 2 

+ i S S S E ( - f4 ' 2? ) 

i, A:=l u, » = 1 

with w-g-l,»-2g-l and g degrees of freedom respectively. 

In practice, the following alternative method of evaluating and testing the partial 
regression coefficients in two correlated samples may be found useful. 

h uV b u2 (u = 1,2 are identical with the estimates of the partial regression coefficients 
of?/(= ?/j - y 2 ) on the 2q independent variates x' ul ( = x ul ), x' u2 ( = — x u2 ). The normal equations 
for estimating the coefficients in this regression are 

2 2 Kik S *«*.) ( x vij~ x vi.) = S ( x vij~ x vi.) ) (u = 1, 2, t = 1,2). 

fc-1 U=1 3 3 

Substituting x ukj = (- 1 ) fc -i a;„ w , yj = £ (-1 ) fc_1 ?/ &J , 

fc=l 

and similarly for themeans, these reduce to (4-21), It may readily be shown that all the tests 
relating to the b u , b 2i , ...,b vi (i = 1,2) when considered in this manner are identical with those 
given above. 

4-3. Two samples with a single independent variate (p = 2 ,q -■ 1) 

The appropriate results follow easily from 4-1. Tho two equations (4-11) (i — 1,2) may ho 
immediately solved for b v 6 2 , giving 

b\ = D 1 {P 22 (<2h — Qn) + P\i(Qvi ~ C?ai)}> 

b 2 = D-'{I\ 1 (Q 2i -Q 2l ) + P u (Q 11 .-Q n )}J ' 


i — P\i Aa ~ I’lv 


(■*•31) 

( 4 - 32 ) 


From (4-12), (4T3) we have 

(n - 3) a* = ££ (v M - = i{(R u + P 22 - 2 R u ) - ( b , Q n - l h Q u - l h Q n + 6, Q i2 )} 

= U(Rn + Rzz - 2P 12 ) - (b\P 1L -l- b\ P M - 2/3i 6 3 P 12 )}. (4-33) 

The estimate of the pooled regression is, from (4-14), 

b = [b x (P n - P J2 ) + b 2 (P 22 - P2l)}l(P n + P„ ~ 2/jj|) 

~ (Qii — Qii—Qii + Qt 2 )l(P n + P n -2P l2 ). {4-34) 

The identity (4-16) from which is derived the test of the significance of the difference I It. b, I 
becomes * 1 

mu + - 2 R u ) - 5«(P U + P 22 - 2P 12 )} s *{(P U + P 22 - 2 R n ) - (blP u + b\P n - 26, 6,P l9 )} 

+ £{( 6 i - b 2 ) 2 Dj(P n + P 22 - 2P, 2 )} (4-35) 

with degrees of freedom n- 2, n~ 3 and 1 respectively. 

Finally, (4T9) becomes 

(9 } = ( \Pu ~\P U \-' _ /2P 22 D-i 2P v ,D~ 1 \ 

’ W*- IPJ ~ ISP,,!) -1 2P,.D~ l ) 


( \Pu 

( 

V-Pxa 

iPj ~ \ 


( 4 - 30 ) 



38 The estimation and comparison of residual regressions 


4'4, The case of independent samples 

If in the specification (2-1) we put a* = 0, we have the population model appropriate to the 
case of p independent samples with q independent variates, where the residual variances are 
equal, Proceeding in a manner similar to that of § 3, we easily derive the following known 
results. It is instructive to compare them with the corresponding results for related samples, 
which are referred to in the left-hand margin. 

The estimates of the partial regression coefficients are given by 


(3-3') (*“1.2.1.2. . p). 

1 i=l 

For the estimate s 2 of the (constant) residual variance cr 2 we have 

{Pin- 1) ~n}s* = 22 (Vij-Yy ) 2 = 2 Ua~ 2 KiQuii) 

i j i \ M=1 / 

= 22 KiKA 

i \ u , u=i 

To test the homogeneity of the b u (i = 1,2, we obtain 

*i,2-Pu« + 2 2 KiTulii - YiQli'o 


(4*41) 


(3-4') 


(3-8') 

i i 

i 


. Ylvii 4 2 b-uiPmii 

= Qm (*-2,3,. 

and 




22(i/ iJ -l r «) 2 =2k- 

d ) l.Qlii~ 2 friiiQwn) 

(3-7) 

i j i \ 

«=2 1 


(4-42) 


(4'43) 


= 'Z\Ru-b'i. p uii-2b' 1 . 2 KuKiii- 22 b' vi b' vt P uvii] , 

ii=2 u,v=2 


(4-44) 


and employ the identity (3-8), noting that 22 (Vy- Yy) 2 is now based on (np - 1 )-pq 

i i 

degrees of freedom, and the right-hand members have {np—p)—pq and p— 1 degrees of 
freedom respectively. 

Finally, for the overall test of homogeneity among samples of all the 

^li> ^2i> • • • >bqi {i — 1,2, 

we derive the results: 


(3-9') 

(3-10) 


2 2 K.Tmii ~ 2Q 1)11 ( v — 1; 2, ..., 9), 

i u=l { 

22(y iJ -ry 2 = s(ij. i - | 

1 I i \ U=1 I 

= ?(*«- 22 K.hAvX 

x \ n, 3)=! / 


(4-45) 


(4'46) 


The appropriate identity corresponding to (2-4) is then 

?(*«- £5, b - b - p <4■? ( s «- 


(3-12) 


+ 2 22 Ki-K.)(b vi -b V ')P m 

X u , V = 1 


(4-47) 


with (np-p)-q, (np -p)-pq and q(p~ l) degrees of freedom respectively. 



39 


A. H. Carter 

In the simple case q = 1, the above results reduce to the well-known formulae 

bi Pu~ Qu (i = lj 2,..-,p)j 

(»-2)a» = EEG/y-^i) 2 - 2 

i j i 

b£ fyi ~ 2 Qii’ 

i i 

22 (2 

1 j i 

the appropriate identity from which is derived tlie test of the homogeneity of the b { being 

2 (Bu - b ip u) = 2 (Rii-b\l\i) + 2 {bi- b)*P u , 

i i i 

with p[n -1) -1, p{n -1) -p and p -1 degrees of freedom. 

From the foregoing, it is seen that the results for independent samples are identical with 
those for non-independent samples provided we write 

P'-utik — $ik P ii\yik> Quik — &ik Quik’ R\k-8ik P ik' 

which are equivalent to P mik = Q uik = R ik = 0 (i+i), in the former case. Conversely, given 
the results for p independent samples, we may readily derive those appropriate to p related 

V 

samples by replacing P vvii by Z^P' u>) { k , etc. 

5. Discussion 

Provided the specification (2-1) correctly describes the population from which tho observed 
values are drawn, the estimates derived above are 1 best ’ estimates in tho maximum likelihood 
sense, and the resulting testa are exact tests. It is important to bear in mind tho underlying 
assumptions implied in the specification, the validity of which necessarily conditions the 
applicability of the results. 

The assumption of normality of residual variation is one frequently accepted, anil one 
which has a satisfactory empirical basis in many biological fields. 

The condition that the correlation between samples is due to an additive effect common to 
corresponding members would appear to be a reasonable one in the type of eases considered 
in § 1. If there were grounds for believing the samples to be related in some ot her manner, 
a suitable transformation might be applied, to ensure that the condit ion held. 

There is the further requirement that the effects of tho two factors of dtiHHifieation 
samples on the one hand, individual sample members on the other —be independent., i.e. that 
there is no interaction. In the sugar beet experiment quoted in § 1, for example, t his requires 
that the several localities do not exert differential effects on the different varieties, i.e. that, 
locality-variety interaction is negligible. Whether the assumption is justifiable or not will 
depend on the circumstances of any particular problem. Further consideration would bo 
required for the more general case where interaction exists. 

Implicit throughout this paper has been the supposition that the independent variates 
(the x s) were not subject to error. In deriving the tests of homogeneity and significance, wo 
have in effect considered the sampling distribution of a test criterion (a variance-ratio 



40 The estimation and comparison of residual regressions 

‘Student’s’ t) for repeated sampling from a population in which the x’b were held fixed. The 
sampling distribution of this criterion, however, is clearly seen to be independent of the m’s. 
Provided the relation (2- 1) holds, therefore, the tests will be valid irrespective of the distribu¬ 
tion of the x’b. 

An assumption which might appear more difficult to justify in practice is that concerning 
the equality of the residual variance for each sample. Since this variance is estimated for the 
totality of samples but not for each separately, no standard test can be made of the homo¬ 
geneity of the separate residual variances. The assumption of equal residual variance is, of 
course, the same as that made in testing the homogeneity of the means of correlated samples 
in an analysis of variance, 

A case which requires special consideration is that where one or more of the independent 
variates are the same for all samples. If the formulae given in the preceding sections were 
applied in this case, it would be found that the equations for estimating the regression coeffi¬ 
cients were indeterminate, and we conclude that the specification of the population adopted 
is inappropriate. To satisfy the criterion of equal residual variances, in fact, the correct 
specification would require that the regression coefficients on a variate which is the same for 
all samples are themselves the same. On logical grounds, it seems reasonable to suppose that 
the effect on the dependent variate of such a common independent variate, as measured by 
the relevant regression coefficient, is the same in all samples. In the general case (§3), if 
x Uj = x lj (i = 1,2, for example, the appropriate specification of the population would 

require that say (i = 1,2. p). Proceeding as before, tests of the homogeneity of 

the regression coefficients on those of the independent variates whioh did vary from sample to 
sample could be derived. 

As mentioned in the Introduction, the problem of comparing the regression coefficients 
from two correlated samples has been considered by Yates. It is of interest to compare the 
method proposed by him with that developed in this paper: in the first of the examples which 
follow, where both methods have been applied, it will be observed that very different results 
are obtained. 

In the first place, it is to be noted that the actual estimates of the regression coefficients to 
be compared are themselves different, In the treatment adopted in this paper, the regression 
coefficients as estimated may be regarded as measuring the net effects of the independent 
variates, allowance being made for the correlation between samples. In Yates’s method, the 
estimates, computed for each sample separately, measure rather what might be termed 
' crude ’ regression effects, no account being taken of the correlation. To assess the validity of 
these estimates and of the derived tests, we must examine the underlying assumptions 
(concerning the parent population) which have been made. 

With Yates’s approach, the two samples, comprising, in the general case of q independent 
variates, the observed values {y ip x ni} ) (u = 1,2 i = 1,2; j = 1,2,. ,.,n) are supposed 
drawn from two populations specified by 

Vi) = E PuiXuij + Yt+l^+e'ij (i = l,2;j= 1,2,...,»), (5-1) 

where fi ui , yf, y* aTe as in (2-1), the e',- are random residuals normally distributed about zero 
means with (different) variances of (i =1,2), and e' ip are correlated, their covariance 
being k. 



41 


A, H. Cabteb 

The estimates b ui of /?* are taken to be the solutions of the normal equations (appropriate to 
independent samples) 

S K&uii-Vui)} = 0 (v = 1,2 ,i = 1,2), (5'2) 


or (compare (441)) 


9 

2 bui-^uuii ~ Quit' 

U =>1 


The difference | b ul - 6„ a | (« = 1,2. q) is then normally distributed. Its standard error U 

not of course known, hut must be estimated from the data. Were this estimate a simple one 
based on a known number of degrees of freedom, an exact bteat could be applied to test the 
significance of the difference. In fact, the estimate is a composite one, witlx component parts 
based on different degrees of freedom. Only if the samples were sufficiently large, could the 
approximate normal test justifiably replace the exact t-test, and the significance of the 
difference j b uX —b. d2 [ be tested. 

The precise interpretation of the estimates b ui given by (5-2) is, however, open to doubt, 
Were the samples independent they would be the maximum likelihood estimates of the 
partial regression coefficients P ui \ they are not maximum likelihood estimates under the 
specification (54). The maximum likelihood equations for this case are in fact complicated, 
not capable of algebraic solution even in the simplest case, q — 1. 

Comparing now the specification (54) with (24), we see that, if they are to be equivalent, 


e ij ~~ a j+ e -y> ( ,r> '3) 

where e' u , e' 2J have different variances cr( 2 , cr' 2 , and are correlated, while a 2J have the same 
variance cr a , and axe independent. The population model adopted in § 2 is thus leas general 
than that of Yates: on the one hand the correlation between the residuals is assumed clue to 
a common additive factor; and on the other, the residual variances are assumed equal. The 
first restriction has already been discussed; as regards the second, further consideration 
would be required if the residual variances were presumed unequal, It may be remarked that, 
foT the case of two samples only, the maximum likelihood estimates of the regression coeffi - 
oients when the specification (24) is extended to allow for different residual variances are in 
fact the same as those given in § 4-2. 

In conclusion we note that, provided the assumptions made concerning the parent popula¬ 
tion (§ 2) are valid, the method of estimating and comparing the regression coefficients from 
correlated samples presented in this paper has the following advantages: (1) the estimates are 
best estimates from the point of view of maximum likelihood, (2) the resulting tests are 
exact tests, and (3) the results are applicable to any number of samples. 

It is hoped to deal in a further paper with (1) the application of the theory in covariance 
analysis, (2) the case where interaction is assumed to exist, (3) the assumption of unequal 
residual variances, (4) the extension of the theory to a three-factor classification, (5) the case 
where one or more of the independent variates are the same for all samples, and (0) curvilinear 
regression. 


7. Numerical illustrations 

Example 1. The following data, extracted from the Rothamsted Experimental Station 
Annual Reports, relate to the Broadbalk wheat plots: 



42 


The estimation and comparison of residual regressions 

y denotes yield of grain (in bushels per acre). 

* denotes amount of straw (in cwt. per acre). 

I represents treatment 2, receiving farmyard manure (treatment 2A after 1922). 

II represents treatment 13, receiving nitrogen, potash and phosphates (artificial fertilizer). 


Yearf 

I 

II 

Vi 


v% 

££ 2 

1908 

38-6 

32-2 

36-0 

29-6 

1910 

27-9 

38-3 

26-3 

34-0 

1912(1) 

IQ-9 

17-6 

6-1 

9-5 

1914(1) 

30-7 

36-6 

19-2 

21-6 

1916 (1) 

33-3 

41-3 

26-1 

36-8 

1918 (2) 

30-8 

38-8 

20-3 

27-2 

1920 (2) 

28-3 

38'4 

24-9 

29-6 

1922 (2) 

32-9 

31-8 

24-4 

26-9 

1924 (2) 

10-3 

18-6 

16-0 

21-2 

1926 (1) 

6-8 

24-6 

9-3 

26-4 

1928 (2) 

41-1 

61-3 

55-2 

66-2 

1930 (3) 

261 

60-0 

29-2 

68-1 

1932 (3) 

10-1 

42'6 

11-0 

46-3 

1934 (3) 

23-3 

63-8 

28-6 

60-8 

1936 (3) 

7-3 

34'8 

9-4 

26-4 

1938 (4) 

38-2 

41'0 

42-5 

31-9 


•f Subsequent to 1910, the main treatment plots were subdivided. The numbers in brackets indicate 
the plots to which the figures given relate: (1) lower portion of field, (2) upper portion of field, (3) subplot 
V, (4) subplot VI. It is recognized that since the figures given under I and II do not in fact all relate 
to the same plots, the methods applied may not be strictly appropriate; though since the treatments are 
the same, the data are of value for the purpose of illustration. 


Does the regression of yield of grain on amount of straw differ significantly for the two 
manurial treatments'! Since the corresponding pairs of observations in each sample relate to 
the same year, the samples cannot be regarded as independent. It seems reasonable to 
suppose, however, that the relation between the two samples is due to a common additive 
‘ year ’ factor, i.e. the a p and we apply the results of § 4- 3. From the given data we find 


P n = 2,423-97, 

Q n = 814-25, R n = 1,987-06, 

P 12 = 2,527-68, 

Q n = 1,363-84, R n = 1,875-19, 

P 22 = 3,119-18, 

Q n = 561-61, P 22 = 2,547-76, 

D- 1 = 0-8535 x 10- 8 , 

1,545-59. 

From (4-31), we obtain b x - 

0-660, b t = 0-850. 


From (4-33), = 1T945 (13 degrees of freedom). 


( t = f 0 ' 0053 0-0043N 

\0-0043 0-0041/• 

K) = 2-62*. f (bj) = 3-82**, 1'91- 

We conclude that b x and 6 2 are significantly different from zero (5 and 1 % levels respectively), 
while the difference (fco-td approaches significance (5% point = 2-16). 


From (4-36), 
Hence finally 4 


I We shall use the markings *, ** and *** 
levels. 


, respectively, to denote significance at the 5, l and O-l 


0 / 

/o 




A. H. Cakter 


43 


Yates's method. To avoid confusion, the regression estimates will be denoted by b[ and b' t . 
In the present notation, for samples of size n we have (writing var for 1 estimate of variance'} 

b[ = Q ul Pn> var(6 1 ) = (Ru—b L Q n )l(n — 2) P 1X , = h(/\/{var (6 X )} 

(n — 2 degrees of freedom), 

and similarly for b 2 . 

The estimate of the variance of the difference (b 2 - b{) is given by 


var (b 2 - b[) - var (b[) + var (b' 2 ) - 


2 R 


it 


(»-3)P u P 22+ Pf 2 


(i^-KQit-KQn+KW- 


For the example quoted, therefore, we derive the results 


= 0-336, var (6j) = 0-0505, t^)— 1-49 7 

4 ;. 0-496, var(4j| = 0'0408, ^ _ 2-4S*/ 14 d 'e re « of:freedom (3 % point i, 2-146) 
(^a —*1) = 0-1596, var (6'-hi) = 0-0159. 

The estimate of the standard error of [b[ - b’J is therefore 0-1261. On the assumption that 
( b t~ b i) is normally distributed with true variance 0-0159 the difference is clearly far from 
significant. To sum up, b' 2 is significant at the 6 % level, but neither h\ nor the difference 
(b'i-b'j) is significant. 

Example 2. In an experiment to investigate wool growth, measurements of the arm of 
certain tattooed squares (initially the same size), and of the weight of wool produced from 
these squares, were obtained for a number of sheep, in successive seasons. In the following 
table are the figures relating to four such squares, on different parts of the same sheep, fur the 
summer season. 


y d6n l 63 tho 'clean^weight of the wool cut from the tattooed square. * denotes the area of 


thesquare. I, II, HI, IV denote fore, back, hip, and britchrogimw roapectiveiy. 

Ill I iv 


Sheep 

no. 


Vi 


II 


2/a 


2/s 


*3 


2/i 


1 

2 

3 

4 
6 
0 

7 

8 
9 

10 

11 


82 

84 

100 

96 

88 

08 

113 

136 

80 

144 

64 


71 

72 
107 

02 

82 

48 

70 

102 

61 

92 

6S 



117 
142 
198 
160 
164 

79 
167 
190 

118 
101 

80 


98 

09 

in 

74 

117 

61 

08 

116 

68 

100 

62 


78 

89 

148 

98 

98 

70 

114 

130 

93 

140 

75 


70 
74 

102 

71 
85 
67 
67 
97 
60 

110 

00 


m 

07 

90 

05 

02 

00 

85 

05 

54 

97 

42 


72 
70 

106 

02 

73 
51 
fill 

79 
03 

80 
68 





44 


The estimation and comparison of residual regressions 

R ik Qik 



4 4 4 

2 Q'ik = Qii-i 2 Qik> hence 2 Qik = 5229-4(18,089) = 706-7, 

fc=l Jc=l k=l 

4 

2 %e = 6931-4(17,764) = 2490-0, 

k= 1 
4 

2 Q' 3fc = 4624-J(18,360) = 34-0. 

fc=i 

4 

2 = 1603-4(12,201) = -1447-3, 

fc=i 

22^ = 2^-422^* = 13,996-4(47,627) = 2088-25, 

i k i i k 

22% s = 2<9i i -422Q i j ; = 18,387-4(66,414) = 1783-5, 

i k i i k 

22-S; fc = 2-Rii-422 36,824-4(120,486) = 5702-5. 

i k i i k 


The equations to solve for the (4-IT) are therefore 

6 X 2785-5 —6 2 887-5-6 3 836-3-6 4 600-0 = 706-7, 

-&i 887-5+ 6 a 3464-3-6., 812-0-6 4 513-5 = 2490-0, 

~&i 836-3 —6 2 812-0 + 6 3 2826-7 —6 4 554-7 = 34-0, 

600-0 — 6 a 513-5 —6 a 554-7+ 6 4 1419-7 = -1447-3. 






































45 


A. H. Carter 

These equations may best be solved by using the inverse matrix of coefficients 

{Oik) ~ (Pik) 1 

(see (419)) which will itself be required later, 

= 10- 8 / 774 408 476 661 \ . 

/ 408 562 385 526 | 

i 476 385 728 625 J 

\ 061 526 625 1419/ 


We then obtain b x = 0-6234, b, - 0-9387, b 3 = 0-4164, b 4 = - 0-2534. 

Now EE hQ'ik - E6i{s%] = (0*6234) (706*7)+ ..■ = 3158-9, 

i, fe* 1 i \ k / 

hence the sum of squares of deviations from the separate regressions, i.e. S2(j^~ *«)*' is ‘ 


from (4*12), 


SEi*tt-SS^0« - 5702-5-3158*9 = 2543*6. 

i It i, k 


From (4*13) therefore, s 1 = 2543*6/26 = 97*83. 

For the pooled regression, (4*14') gives 

l = ESQa/SS**" 178B-5/2088-25 = 0*8541, 

i fc I i k 

and by (4*15), the sum of squares of deviations from the pooled regression, EE (2/y - J r y)*» ,H 

i j 

EE 5U-i^22 % = 5702*5- (0*8541) (1783*5) « 4179*3. 

i,k i k 

The sum of squares due to differences between the separate regressions and the pooled 
regression is therefore (4179*3-2543*6) = 1635*7. The appropriate analysis of variance is: 


Source of variation. 

Degrees of 
freodom 

Sum of 
squares 

Mean 

square 

Due to pooled regression 

1 

1523*2 


Differences between separate regressions 

3 

1635*7 

545*2 a «* 

and pooled regression 




Due to separate regressions 

! 

4 1 

3158*0 

789*7 a 4 

Deviations from separate regressions 

20 

2543*6 

97*83 « 

Total 

i 

30 

5702*5 



To test the significance of the general regression, the variance-ratio is 789*7/97*83 = 8-1*** 
(4 and 26 degrees of freedom), significant at 0*1 %. For the homogeneity of the separate 
regression coefficients, the appropriate variance-ratio is 545*2/97*83 = 5*6** (3 and 26 degrees 
of freedom), which is significant at the 1 % level. We conclude therefore that the regression is 
on the whole very significant, but there is strong evidence of heterogeneity as between 
samples. We may wish to enquire where this heterogeneity lies, 



46 The estimation and comparison of residual regressions 

For the significance of the separate coefficients, we have 

ki) = where s = V 97 ‘ 83 = 9-891 (26 degrees of freedom). 

Hence = O-024/{9*891 V(?74 x 1(H)} = 2-26* 

Similarly \) = 4-0**, t%\ = 1*55, k b{) = 0-68. 

To test the significance of the difference between any two coefficients, ) b i - b k | for 
example, we have 1 1 bi _ bl .] distributed as ‘Student’s ’t (on the hypothesis — /?;.), where 

hb i -b k \ = \h-K\M9u+9kk- i 9ik)l 

Examples are: 

f M<) = O*6698/{0-891xlO-»V(728 +1419-2x625)} = 2-26*, 

h r b 3 ) - °' 89 1 ki- bj) = = 2 ‘ 31 *‘ 

To summarize, the regression of weight of wool per square on the area of the square is 
adjudged significant for the fore and back regions, but not for the hip and britch regions. As 
regards the separate regression coefficients, that for the britch region is significantly less than 
for the other three regions; that for the back region is significantly greater than for the hip 
region; the differences between the fore region on one hand, and the back and hip regions on 
the other, are not significant. 

I wish to record my thanks to Dr J. Wishart for assistance and-advice in compiling this 
paper. I am indebted also to Dr H. E. Daniels for initially drawing my attention to the 
problem dealt with; to the Wool Industries Research Association for placing at my disposal 
the data of Example 2; and to Professor E. S. Pearson for constructive advice on the presenta¬ 
tion of the paper, 


REFERENCES 

Bartlett, M: S. (1933). On the theory of statistical regression. Proc. Boy . Soc. Edinb. 53, 260, 
Bartlett, M. S. (1934), The problem in statistics of testing several variances. Proc, Camb. Phil, Soc. 
30, 164. 

Galfin, N. (1947), A study of wool growth, I. /, Agrio. Sci, 37, 276. 

Kolodziejczvk, St. (1936), On an important class of statistical hypotheses. Biometrika, 27, 161. 
Welch, B. L. (1935). Soma problems in the analysis of regression among h samples of two variables. 
Biometrika, 27, 145. 

Yates, F. (1933). The principles of orthogonality and confounding in replicated experiments. J.Agric. 
Sci, 23, 108. 

Yates, F. (1939), Tests of significance of the differences between regression coefficients derived from 
two sets of correlated variates. Proc. Boy. Soc. Edinb. 59, 184. 



[ 47 ] 


CUMULANTS OF MULTIVARIATE MULTINOMIAL DISTRIBUTIONS 

By JOHN WISHART, Statistical Laboratory , University of Cambridtjc 


1. For the ordinary (Bernoulli) multinomial distribution in ono variable a simple eumulant 
recurrence relation is clue to Guldberg (1935) and is deduced as follows: 

Let an event, for which the chance of failure isp 0 , happen in any one of n ways, with proba¬ 
bilities p x ,p 2 , ■ ■ ■, Pn' Then the chance that in a random sample of s trials thore will bo .r, 
successes of the first kind, x 2 of the second, ..., x n of the last, will bo 


s : 




iPo’Pi'.-.Pn", 


being the general term in the multinomial expansion of 

(i>o+Ti + ---+T») 8 > 


where 


Pa = 1 - £Pi, 


Xn - 




71 


S (*,)• 


The probability generating function (p.g.f.) is 



2 Pt 

i-l 


«*) =a# _s |l + i S i ^a i j 


(M) 


if we put Pi/po = (i = 1,2so that ^ = aja 0> where a 0 = 1+ £ 

i-U 

To obtain the eumulant generating function (e.g.f.) we put a t = e!‘ in (I-l) and take the 
natural logarithm. In the usual notation we then have 


It follows that 


K — s{—In (1 + Laf) -f In (l + Sa^et)), 

9 K sa t e li 

ct { 1 + Ea^e** 


K -' 1 -- + Sai [l+La i e‘i 1+Ntt} 




dK 


da i 


since /c x , the first-order eumulant (or mean), is given by 


( 1 ‘ 2 J 


/C 


. 1 ..— 


BK\ = _m t 

/ 1^0 1 + Ntt; 


d-3) 


The unit in the subscript for k is regarded as being in the ith place. 

, We differentiate (1 - 2) r f times with respect to ^ for all i = 1, 2, ...,n. Then on changing 

the order of the differentiation on the right-hand side and putting t, = 0, for i = 1 2 n 
we have ’ ’ 

3 

' C r l r 1 ... ri +l...r n = a ifa(Xr 1 r 1 ...r i ...r„), 

1 


(1-4) 




48 Cumulants of multivariate multinomial distributions 

where r^O with at least one non-zero r. This is Guldberg’s result, of which all simpler 
results are special cases. Thus for the ordinary binomial distribution we have 

dtc T . ,, 


where a = p/(l -p) = pjq- Alternatively, we may write 

da. 

the well-known result due to Frisch (1925), and rediscovered by Haldane (1940). 

The cumulants are easily worked out in the multinomial case. We start with (1-3), namely, 

*.i.. = s%'a 0 = 8Pi> 


rs r\ 

and use the relations a;-— = - a*— 
1 3a,. 1 3a,. 


Pi9i, where q t = l-p t 


tylt 


a,- ~ - a,-~ = -PiPj (i =¥j). 


§Pi. 

1 da i -i 

With the m inimum of algebraic manipulation we then obtain the cumulants to any desired 
order. To the fourth order they are: 

*..4.. = **f?f(l- 0 Pi?i). 


*..81. = 

*..22. = - mPjiiii-Pi) (?y -Pt)+iPiPtl 
*..2U. = ^P i PjPM~ 2 Pi)> 

* 1111 . = ~ toPiPjPhPl- 


(1-5) 


*.. 1.. = sp» 

*..2.. = «*<?* 

*..11, =-8PiPj> 

*..3.. = SPiVitii-Pi), 

*..21. = ~SPiPMi-Pi)’ 

* 111. = ^PiPjPk, 

In the above the order of the non-zero subscripts in any k is that of the introduced 

on the right-hand side. These results comprise all special cases. Mention should be made of 
papers by Qvale (1932) and Gotaas (1936), who dealt with the conditions to be satisfied by 
a general discontinuous frequency distribution for a simple cumulant recurrence relationship 
of the pattern of (1-4) to hold. 

2. The negative binomial, i.e. the binomial distribution with negative index, is well known. 
Its multinomial generalization (for one variable) has been called the Pascal multinomial 
distribution, and this name will be used to distinguish it from the Bernoulli distribution. We 
are here concerned with the joint probability that an event (with constant probability p { ) 
shall occur % t times, for % = 1,2,..., n, and that the event not {A x A s ... A n ), with probability 

n 

jp 0 , shall occur s times (including the result of the last trial) out of s+ £ x i repeated trials. 

i=i 

This probability is given by 

(s-l+Sx)! „ T 


being the general term in the multinomial expansion of 

Po{'-Pi-Pz-~.-p n )-°, 

r- 


*3 1 


2 PiOCi 
i=l 


where p 0 = 1 - l Pi . The p.g.f. is 


(2T) 



John Wishart 


49 


while the c.g.f. may be written 

K = s{ln (1 -Xp t ) -In (1 - SpA)}- 


It follows that 


a K 

dt } 




pi& 


= *..i..+p 


97 $: 


sinoe 


hh’ 

= W . w* 

" 1 " l dtj ti =n ^ — -hPi 


( 2 . 2 ) 

(2-3) 


As before, differentiate (2-2) r i times with respect to t t for all i = 1,2,Changing the 
order of differentiation and putting t t = 0, for i = 1, 2, we have 

9 f 

‘'rir 2 ...ri...r n / 


'riTa...ri+1...r B ~ Pi dp-^ Kr 


J> 


(2-4) 


where 0 with at least one non-zero r. This result was also deduced by Guldberg (1935), 
A special case is that for the negative binomial, and is 

d.K r , 

*r+ i=Pj£ (^ 1 )> 

as used, with a slight change of notation, by the author in a former paper (Wishart, 1947). It 
should be noted that the simplest form of the Bernoulli relation is in terms of a 0 as in (1*4), 
whereas p i takes the place of a t in the simplest form (2-4) of the Pascal relation. 

On the other hand, the oumulants of the Bernoulli distribution are most simply expressed 
in terms ofp ; and q t , as in (1-5), whereas those of the Pascal distribution can be written most 
simply in terms of a i and 6 £ = 1 + a. t . Thus we start with (2-3), namely, 

*.i.. = Will’ o - sa h 

and use the relations 


da,; dbt 

v ‘wr aih ’ 

To the fourth order the oumulants are: 


Vi 


dpj 


96, : 

tr ata * 


(t +j). 


*,.i.. = sa it 

*..2.. 

*.,3.. =8a i b l {b i +a i ) i 
*..21. = sa^ibi+a^, 


*..4.. = i , aA( 1 + (5a A)> 

*..3i. = sa.itt/1 + 6tt £ 6J, 

* . 22 . = sft £ aj{(6 ( + a { ) (bj + a } ) + 2a t n £ |, 
*.2U. = 2M(«/a*A + 2«t), 

*. 1111 . = 


MU. 


2sa*a,a ftJ 


(2-5) 


These results comprise all special cases. The formulae in (2-5) are similar in form to those in 
(1-5), but with the signs all positive, and we may perhaps repeat that 

PilPo = so that Pi = aja 0 , 
where 3> 0 = l-2pi. «o = 1 + Sa,. 

3. We shall now consider the corresponding multivariate distributions of both the Bernoulli 
and Pascal kinds, with a view to deriving the appropriate cumulant recurrence relations. 

Biometriita 36 



50 Cumulants of multivariate multinomial distributions 

From these we shall work out all the different cumulants, up to the fourth order. The work is 
straightforward, the chief difficulty being to devise a satisfactory notation in order to identify 
and condense the formulae. The methods are sufficiently illustrated by taking the simplest 
case of the bivariate binomial, first for the Bernoulli case and then for Pascal. 

The two variates are the numbers of successes in two events. Let the respective probabilities 
be as in the following table: 2nd event 

Success Failure Total 


1st event 

Success 

Pa. 

Pi 0 

p 


Failure 

Poi 

Poo 

<1 


Total 

P' 


1 


We have q = 1 -p, q' ~ 1 —p' and p w = 1 -Pm~Poi~Pn- We note also that 

Pmlhi-VwPoi^ Pn-PP'- 

The probability of successes in the first event (and s~x t failures) and of x 2 successes in 
the second event (and s-x 2 failures) is given by the following function of the two variables 
Xi and x a : min . 


in which 


c! 


\dj ~ d\(c~d)V 

It may be deduced from (3-1) (or more simply from the p.g.f. below—see § 4) that the 
marginal distributions are represented by the binomials {q+pf and {q' +p') s respectively. 
Also when p u = pp\ so that p 10 = pq ,p Ql = p'q, p m = qq', (3-1) becomes the product of two 
binomial forms, so that the distributions are independent. 

The p.g.f. is 

(2%+Ibo a + I ) oi/ fl +2hi a /5) s = <W(l +a ln ct + a m /] + a u (x./]) s , (3-2) 

if we put Piq/Poq = n 10 . Poi/poo ~ PxlIPou ~ so that = 1 "b ®io ~b ®or ~b “ ®ooi say. 
Putting a = e‘, ft = e 1 ' we have for the e.g.f. 

K = s{ - In a 00 + In (1 + a l0 e l + a 01 a u + a n e** 1 ')}. 

dK _ se>(a 10 +a n e v ) 
dt ~ D 


Then 


where D = 1 + a 10 e‘ + a nl e u + a n e 1 ' 1 


= + « 


^(PlO ~b Ojx g) Uio ~b dn 

D a , 


-‘-oo 


and similarly 
where 


a K 

du 


e,, ( a oiff’ a u e< ) a oi + w n 
D 


*„-(4£l ,fe±5u) 


/*»W=0 


) A 01 


■( 


dK 

du 


*00 


i=li = 0 


*(«oi + °u) 


*oo 


(3-3) 


are the means of the variables x y and x v 
It follows that 


9A dK dK dK 

— ^io + »io 5”-bail's—> v— 

ha 3a u du 


dK 


dt , Yu = Kal+ a n ~ + a n 


dK 

da n ‘ 


(3-4) 



51 


John Wisiiart 

Differentiating r 1 times with respect to t and r 2 times with respect to u, and changing Die order 
of differentiation on the right-hand side, we have, on putting t ■=■ u = 0, 

id d \ 

Kr 1+ l.rt ~ ( ft 10 aaio + «llg a J V,.r 2 . 

id 0 \ 

^,.r a+ i - [ a oi d a m + a nSaJ Kr '- r ' 

These are the required cumulant recurrence relations. To use them we start with (3*3), 
namely, k 10 = sp, k 01 = sp', and can then readily verify that if A and A' be written for the 
complete operators on the right-hand side of (3-5), we have 

A{p) = -A{q)=pq, A'(p') = -d'fa') = p'q', 

A(p') = A'(p) = —A{q') = - A'{q) =p u -pp'\ 

also A(p n -pp') = ( q-p)(p u -pp '). A'(p u ~pp') = iq'-p') (Pn~PP')- 

Let us agree to write P( U ) for p n —pp'- Then, to the fourth order, 

k 10 = sp, k 01 = sp', 

%=«. K n = *Wii>» *02 = sp'q', 

*30 « *03 = sp'q'{q' -p‘), 

*21 = s 2 Hn){q ~P )> K n = sp iu) (q' - p'), 

K i0 = m ( 1 - 6 ot ). %i = sp'q'i 1 ~ i] p'q')> 

Kn = «Phi)(l - 6 M)> k u = sp (ll) (l - 0 p'q'), 

*22 = ~P) (<?' -JP') - 2p (n) }. 




In the special case of p n = 0 we see from (3-2) that the case is that of a univariate trinomial, 
and, in fact, formulae (3-6) become equivalent to (1-5). 

4. There is a corresponding bivariate negative binomial, or Pascal, distribution, for which 
the p.g.f. is 

24(1 -Iho^-PoJ-Pn (4-1) 

where p m = 1 -p w -p 01 -p lv To show that this is the correct form, write it as 

foo{(l ~PmP) ~ (Pio + Pufl) a }~ s = + '^, 1 l j (1-2W^)"' 1 r ‘(2h(»d 

It follows from this that the marginal distribution of obtained by summing the general 
term of the distribution whose p.g.f. is (4-1) from aq = 0 to 00 , has the p.g.f. 

Jo f ^ )( 1 -24A)-‘-fy2ho+2hi/^‘} 

= J»8o{(l -PtoP) ~ (ft 0 +Pll/?)}~ S 
= i>oo{( 1 -p 10 ) - (fti +Pn) p}~\ 

and therefore represents a negative binomial or Pascal distrib ution - 



> Cumulants of multivariate multinomial distributions 

Likewise, the marginal distribution of x 3 has the p.g.f. 


?4{(l-?ku)-(?ho + PnM~ s - 


Now beginning with (4-1) and writing a = e‘, {1 = e", it is easy to show by the method of 
§3, but without this time transforming to a 10 , a 01 , etc., that 


'Vl.r 


{ J_ 


„ +! ' U 5?: 


") K > 


r, ■ r 


Poi 


d d 

oi +lhl dPu 


dp t 


) *n.»Y 


( 4 - 2 ) 


with K 10 = s{p 10 ~p n )+t> m = «(«io + ®ii). *oi = s(Poi-Pn)+Poo = «( a oi + «u)- 

Let us write a for « 10 +a u and b for 1 + a; likewise a' for a 01 + a n and b' for 1 +«/. Then 
using P and P' for the complete operators in ( 4 - 2 ) we readily verify that 

P(a) = P{b) = ab, P'(a') = P'(b') = a'b\ 


P(a') = P{b') = P'(a) = P'(6) = aii + aa', 

P(a u + aa') = (h+a) (^W), P'(a u + aa') = (fi' + a') (a n + a«'). 


Let us write a (11) for a u + aa'. Then, to the fourth order, 


*io — *oi — > 

k 20 = sab, Kjj = suni), x 02 — sab , 

k 30 = sa6(6 + a), x oa = sa'b'(b' + a'), 

*21 = S< %1)(6 + «)> *12 = + a'), 


x 40 = sa6(l + 6a6), x 04 = sa'b'( 1 + Oa'6'), 

*3i = + G«&), *i3 = s «(n)( l + Mb'), 

*22 = ««ai){( 6 + «) ( 6 ' + *') + 2 «(u)}- 


(4*3) 


5. The reciprocal character of the Bernoulli and Pascal results, in the notation which has 
been adopted, should by now be sufficiently obvious for it to be necessary only to consider 
one of these cases in its most general multivariate multinomial form. We shall choose the 
more familiar Bernoulli case. The general distribution will be one in m variables, such that the 
m marginal distributions are multinomials, not necessarily of the same order. A typical one 
will be described as a (n^ l)-nomial, denoting that the marginal distribution for x { (where 
i - 1,2,..., m) is a multinomial of the {n i +1 )th order, Thus it will be derived by considering 
the chance of success of an event in any one of n { ways, with probabilities pi^pif, ...,p% 
failure being denoted by pfK The superscript (i) denotes the practice already exemplified in 
§§ 3 and 4, i.e. p will refer to the first variate, p' to the second, p" to the third and so on. The 
letter j will denote the general member of the subscripts 1 , 2 ,...,^. 

These p’s which have been defined are marginal probabilities. The primary probabilities 
will be denoted by the letter p with m suffices, taken in order for the variates (or events) 
1,2, each suffix being one of the numbers 0,1,2,Failure in all events will be 
denoted by p m _ , 0 . There will be a unitary series, of which a typical set is p 0 x 0 , p 0 . 2 0 ,..., 

m 

Po..nf..o> altogether 2 ( n i) probabilities. The binary series will have two non-zero suffices, 

the ternary series three, and so on. If £(p) denotes the sum of all p’s with a non-zero suffix 
anywhere, p 00 ,, 0 will be 1 —B(p). We may, as before, put p /p 00 0 = a , where p,,,. 



John Wishart 


53 


denotesany p exceptp 00 , 0 ,anditfoliowsthatp ... = a .. /a 00 0 ,wherea 00 = 1 + 2(a), this 

last summation denoting the sum of all a’s with a non-zero suffix anywhere. 

A proof on the same lines as in the preceding particular case dealt with in § 3, details of 
which it is hardly necessary to give, yields a set of cumulant recurrence relations which may he 

expressed, symbolically by 

*(«+!> = 4 ’( 5-1 ) 


There are 2(raj separate formulae in this expression, arranged in m sets. is an operator of 

the form s/a-^-'l, i.e. the sum of all terms of the form a --- with suffices to the a’s such that 
\ da I da 

j occurs in the ith suffix, the number of terms being such as to permit of all possible combina¬ 
tions of the numbers 0, 1,2, etc., in the remaining suffices. Tlius, to illustrate from a (4 x 3 x 2) 
table, i.e. a three-variate case in which the first variate is 4-nomial, the second 3-nomial and 
the third binomial, we shall have: 


Ai — a 100 3/ 3a 100 + a 110 3/ 9a 11() + a 120 9/3a 120 T a 101 3 j 3a 101 + a 211 3/ 3a 211 + a 121 3/3a 121 , 

A 2 = a 2 QQ 3/3a 200 + a 212 0/3a 21 Q + a 223 3/3a 22 o + ® 20 i d/da 201 "b ® 2 ii d/dci 2 n + a 221 3/3a 222 , 

J 3 = a 3 oo 300 + a 3 io d/da 310 + a 320 3/3a 32 o+a 301 3/3a 201 + a 3U 3/3 a 311 + a 321 3/3a 321 , 

Ji = a oio ^/^ a oio + a no d/da no + a 210 djda 210 -i- a 310 3/3 310 + a 011 3/3a 011 + a m 3/3a ul 
+a 2U 3/3a 2U + 

a 311 djda su , J 

A 2 = a 020 3/ 3a 020 -(- a 12u 3/3a 120 + a 22Q 3/3a 220 4-a 32() 3/3a 220 + a, )2l 3/Srioai T d/da 12l 
+a 221 3/3a 221 + a 321 3/3 a 321 , 

-J = a oox 3/ 0 a ooi + °ioi dlda 10l 4 a 201 3/3a 201 4 - a 30l 3/3a 2(11 -I- a 011 3/3a 011 + 

+ a 2U 3/3a 2U 4 a 3U 0/9a 3U + « 02 i 3/3a 02 i+a 121 3/3a 12l + a 221 0/?a 221 4 a 321 3/3a 321 .. 


(5-2) 


X(y) denotes a cumulant which may be written in full as 


/ r l A 

... 

. \ 

r n 1 +l r n L + 2 

r V 

V 1 

* * J 71J+W2 | 

\ . 


r Un) j 


A line is devoted to each variate, and the r’s denote orders of cumulants, the total order being 
2(r). occurs in the ith row and jth column. K( ij+1 ) is a similar cumulant in which r y is 
replaced by r ij+1 . 

As before, we begin with any k 2 = spf*. 

To work out the cumulants to any fixed order we need only go to an order in the multivariate 
multinomial sufficient to distinguish all the separate cumulants that can occur. In recording 
these we need only put down a typical one of each kind. To the fourth order, for example, we 
need not go beyond a (5 x 4 x 3 x 2) table. As a check, indeed, we can enumerate the formulae 
that can arise from such a table, even although many of the formulae in the list can be derived 
from simpler tables. In the results which follow the separate patterns are given, together 
with the number of formulae which conform to this type and the cumulant for the. type. The 
cumulants are distinguished by their suffices, with dots separating the variates. Thus Ar 110 001 , 




54 


Gumulants of multivariate multinomial distributions 

for example, denotes this particular third order cumulant derived from a bivariate 4-nomial. 
To condense the formulae we shall extend the P( > notation already used in § 3. Thus; 

P(l2) ~Pl2~~PlPi’ 

P{12.) ~ Pl2.~PlPi’ 

Pm e tc -. 

P(123) = 1 J 123 ~Pl 2 . 2h ~~ Pi .aPz — P.iiPl + ^PlPiTi’ etc., 
in which p l2: denotes the sum of all primary probabilities p in which the first and second 
suffices are 1 and 2, etc, Single suffix p’s are marginal probabilities as already defined: 

P{ 1234) = Pvm~ (Pivi.Pl ’t'Pii.iPi + Pi.iiPs+P.atPi) 

~(Pli..P..H+lh.3.P.2.i+Pl.. i P.2i.) 

+ %(Plt..P>P" +Pl.3.PiPi+Pl..iP'iPa+P.23.PlPl+P.i.4PlP9+P..uPlPi) 
-fyPlPtPoPl, et °; 

in which denotes the sum of all primary probabilities in which the first, second and third 
suffices are 1, 2, 3, etc. 

6. Patterns and euinulants from a (5 x 4 x 3 x 2} table, to the fourth order. 

Pattern Number of eases Cumulant 


1st order 

1... 




•• 


(^) K x = sp 

2nd order 

2... 






('l) K t = Spq 


11.. 





10 

K l\~~Vl Pz 


1.. 




1 .. 

10 

1 

II 

£ 

cTc* 

l-w 


1 .. 




.1. 

25 

*10.01 — S P (12) 

/1 

3rd order 

3.. 


0 


•• 


( J l) ^~spq(q-p) 


21. 





20 

* 11 = ~ s PiJh(<h~Pi) 


2. . 




1. . 

■* 

9 /10\ 



20 

’ ” \ 2/ ^ 2.1 = s P(n){q-p) 


o 

JJ , . 




. 1 . 

50 

**20,01 = ■ S P<12)(?1 — Pi) 



John Wishart 


55 


Pattern Number of cases 


3rd order 111. 
(cotit.) 


11 .. 
1.. 


11 .. 
. .1 


1 ... 
1 .. 
1. 


1 ... 

1 .. 

. 1 


.. 1 . 
. 1 . 

1. 




Cumulant 

*m = 2s/),p 2 p 3 


* 11.10 — s (PiP (ni+PaPdi)) 


*110.001 — s (PiP (Ml'I'PzPtoa)) 


*1.1.1 = *? (ill) 


* 10 . 10 ,ox — 6 7 J (i i s) 


*100.010.001 — S P (123) 



31.. 1 

20 *01 = -«PiP 2 (l -6y),(7i) 

22 . . 

]' io *22 = -“PiPiU’h-Pi) (q»-i>*) + 2 PiP*} 

3. . . 

1 ’ • 20 *o.! = sp <u) ( 1 - 0 pq) 

2 . .. 

2 

10 *2.2 = B P(11){(3-P) (2'-P')- 2 P(h)} 

3. . . 

50 *30.01 = S P (12)(1 OPl^l) 

2. . . 

. 2 . , 

^" 20.02 = S P (12){(9 , 1 -?*l) (^/2 ^2h) <12>} 



56 


Cumulants of multivariate, multinomial distributions 


Pattern Number of oases 



Cumulant 

K ni = 2 sp 1 p i p 2 {q l - 2 pi) 

*21.10 = ~ -Pi) +P 2 P (u)(?i - 3 Pi)} 

*12.10 = -*{PiP(nj(?t“ 3pi) +7>2?W?2-Pa)} 

*n.20 = ~ S {(P\V t 2 i) +P2P (n>) (?i ~? 3 i) + 2 ^ do ? 1 (2] )} 

*210.001 = - 8 {PlP (23) (?1 -pi) (13)(?1 ” ^1)} 

*110.002 - ~ S {(lhP + PPP (13)) (73-P3) + 2p( 13 )p( 23) } 

*2,1.1 — s {p(m){Q—p) — ^P(n.)P d.i)} 

*20,10,01 - $ {p (ll2)(Sl -pl)~ 2 v (11. )P (1.2)} 

*10.10.02 = S {p (ll2)({?2 —pl) — 2’P (1 «)‘P (,12)} 

K m .m .001 - s ip ~ %> u*. >? & .3)} 



John Wish art 


57 


Pattern Number of cases 


Cumulant 



Kim = - GsPiPiPsPi 


Km .100 = WPtPtPbit+PaPiPtoD+PiPiPai)) 


K'om.iooo = MP 3 PiPui>+PiPiP{n)+P2PsP(a>) 


K 11 .11 — S (p ill)‘P (22)' + P (12V(21)' ~ ^PlPlPtPi) 


KllO.Oll - — S (p (12 )’P (23)' + P (13) 'P (22)' %PlPiPsPa) 


Knot) .0011 — ~s(p (13 )'P (24)' + P (14 )‘P (23)' — %PlPtPaPt) 


Kll ,10.10 ~ — S (PlP (211) + PiV (111) + P (11. )P (2.1) +P (1 ,l)P (21.)) 


Kll.10.01 ~ ~ a (PlP (212) + PlV (112) +P (11. )P (2.2) +P (1.2 P (21.)) 


Kl00.100.011 — — S (P2P(U3)+P3P(112)+? , (1.2)? , (.13> + P(1.3)P(.12)) 


KlOO.010.011 — 8(PiP (123) + PzP (122) + P. (1.2 )P (.23) + P (1.3)P (.22)) 


*1000.0100.0011 - Mp’lP (124 ) + P'lP (123) + Pil.l)P( .24) +P (1 A)P (.23)) 


Kl.l.1.1 — a P(llll) 


K'lO,10.10.01 — S P (ill!) 


Kl0.10.01.Ol — S P (1122) 


KlOO, 100.010.001 “ a P (1123) 


K'lOOO .0100.0010,0001 — a P (1234) 

(*3 



58 


Oumulants of multivariate multinomial distributions 


p'lia 


The above formulae were worked out directly, and have been checked by condensation 
from the last one of each order. The rules are simple. Suppose, for example, we wish to derive 
Kio.io.oloi frora * 1000 . 0100 . 0010 .oooi> a case in wMch the number of variates is unchanged. We 
first put 2 = 1 and 4 = 3, and then write 3 as 2. This changes p (1334 ) into p <U22) . Now suppose 
we wish to derive /c 20 01 01 from k 10 .io.oi.oi- To do this we coalesce the first two variates. We 

find that p 112S =Pi. 22 , Pu 2 .=P i. 3 ., Pu.z = Pi.. 2 , Pu.. = Pi... = Pi- 
We then suppress a dot in the first or second place, while is read p v y 2 is read p! z and 
read p 2 . We then have 

*20 .oi.oi = ^{^iaa ~{Pn. Pi + Pi , 2 p 2 + 2p 1Z2 Pi ) — (PiP . 22 + 2p 12 p 1 2 ) 

+ 2{PiPzPi + ZPu.PiP'L + 2pi. 2 PiP 2 +P.&P\) - 6 p\p ' pi) 

= S{?(122)(?1 -Pi)- 2 P(l2. )P(1.2)b 

On the other hand, if we want x 10 u 01 from the same source, we must note that 

Pll22 = P. 122 — Pll2. — V.12. ~ 0. 

In the remaining terms we then suppress a dot in the second or third place, while p\ is ready' 
and y 2 is read p'L We then get 

*10.11.01 = — S {PmP2^Tl22Pl J fPll.P.22^'Pli.P.12 

-^Pll.PiPl+Pl2.P'lPl+Pl.2P'lV’2+P.12PlPi+P.22PlP'l) + &P 1 P'lV2Vl) 

= - p (l22 ) + p 2 p (n2 ) + p in ,m.22)+P(12.)P(. 12 )}. 

These rules are obviously useful if higher order results are required, for from the most 
general results of the second, third and fourth order we can conjecture similar results for 
higher order and can then coalesce, Thus the general fifth-order result is A'p< m „ B) , where 
iPU2345) “ (P 12345 — (P 1234 , P6 T + • • • 5 terms) 

— (jPia3.. fP,.. 45 + • - • 10 terms) 

Jr ^(Pii 2 ..P'lp l b + ••• 10 terms) 

+ 2(p 12 ..,p,, 34 ,p 1 5 v -|-... 15 terms) 

- §{Pi2...P'3P'lPs +... 10 terms) 

-f 24p 1 p'y' , p»y‘y 

Also the general sixth order result is <sp (123466) , where 

^(123466) = Pl23456 ~ (^12345 . Pi + ■ ■ ■ 0 terms) 

~(Pi23i..P+■■■ 15 terms) 

— (Pl23...P.. 456 + ... 10 terms) 

+ 2!(p 1234 ,, P5 V pJ +... 15 terms) 

+ 2!(p 123 _ t y <i 4 B ,Pq + ... 60 terms) 

+ ^Pn....P..H..P... .50+ ••• 15 terms) 

~ ® ^{Pi 2 z...Pi l>'i +... 20 terms) 

— 3■(i 5 i 2 ....P.. 34 ,. /»L V vl + ■ •. 45 terms) 

"t ^(Pi 2 ....p 2 pip'h p$ + ... 15 terms) 

-V-PiPiVlp'lvtpl 

It only remains to add that a similar set of oumulants to those of this section exist for the 
corresponding Pascal distribution, in which p is replaced by a, q by b, and all the sig: 


positive. 


signs are 


REFERENCES 

Fbisch, R. (1925). G.R. Acad. Sci., Paris, 181 274. 
Gotaas, P. (1936). Skand. AhtuarTiilskr . 19 200 
Guldberg, S, (1935). Skand. AktuarTUlskr. ’l 8 270. 
Haldane, J. B. S. (1940). Biometrika , 31, 302. 
Qvale, P. (1932). Skand. AktvnrTidskr 15 196 
Wishart, J. (1947). J, Inst. AUu. Stud. So’c. 6, 140.. 



[ 59 ] 


ON THE WISHART DISTRIBUTION IN STATISTICS 

By A. C. AITKEN, D.So., F.R.S. 

Mathematical Institute, 16 Chambers Street, Edinburgh 
1. Introductory 

Wishart’s distribution (Wishart, 1928; Wishart& Bartlett, 1933) is the probability distribu¬ 
tion of the estimates of the \k(k +1) moments of the second order, usually called variances 
and covariances, from a sample of n k- ary vectors drawn from a fc-variate normal correlated 
population. Let the probability differential of the population be 

dp = (2n)- ik | V exp (- ^c'F -1 #) dx (1) 

in matrix notation, where V = [v tj ] is the variance matrix and dx is the element of volume 
in the variate space. It is assumed that n sample vectors 

{% = 1,2. n) (2) 

have been drawn from (1). One evaluates in the usual way the k sample means, 

(j = 1,2,.. k) and then the \k{k + 1) estimates of the v {j , namely, 

1 n 

UiWj = h S (*»-*<)(*/*“*#)• (3) 

Wishart’s distribution is that of the v tj and may be written in matrix notation as 
dp = c An exp[-^tr(»-1) V^f] | F|~ i(ft-1) | ft |««■—*—«) dd, 

where c kn = {r(4)} _Wfc_1, {i( w -l)} ifc(B-:L) II {r$(»-A)}- x (4) 

ft=i 

Here f = [v i} ] , a necessarily positive definite matrix, tr M means tho trace or sum of diagonal 
elements of M, and dv is the element of volume in the \k{k + l)-dimensional space of the v^. 

Various methods have been given for the derivation of this valuable distribution, which 
evidently generalizes to the case of a symmetric and positive definite matrix variate the 
familiar gamma distribution of a scalar variate. A review of these methods is given in a recent 
paper (Wishart, 1948) by the original discoverer himself. His own first method was straight¬ 
forward, to transform the n/c-fold sample normal differential by introducing ‘quadratic 
co-ordinates’, and to integrate aw'ay with respect to the undesired residual variables. Such 
a procedure can be, and has been, described in geometrical language. The later method of 
Wishart and Bartlett (1933) consisted in constructing the multiple Fourier transform of 
the fly and reciprocating it by the use of an important lemma (Ingham, 1933) which generalized 
Hankel’s contour integral for the T-function to the case of \k(k-\- 1) variables, elements of 
a positive definite Hermitian matrix. Yet again, other methods (e.g. Hsu, 1939) depend on 
an induction. But we may refer to Wishart’s paper (1948) for further information on the 
history of the problem. 

Now it would seem that, whatever derivation is adopted, one can hardly expect to avoid 
an encounter with the analytical theory of non-negative definite quadratic and bilinear 
forms; and in fact a fundamental lemma, which stands to Ingham’s lemma as Euler’s integral 
for the T-function stands to Hankel’s contour integral, is to be found in a paper on the above 



gO On the Wishart distribution in statistics 

theory by Siegel (1935). This lemma is quoted, proved differently, and used in another 
context by Garding (1947). 

We establish this lemma in § 3. It is necessary to mention in the first place some general 
considerations on matrix transformations and associated vector transformations. 

2 . Matrix transformations and related vector transformations 
Let X be an arbitrary rectangular matrix of order mxn. Let us suppose its elements written 
down, row after row, as the mn ordered elements of a vector 

£ = ... X ln *^Sn a: 31 ••• *««}■ (1) 

Now let Y = H'XK, where H and K may be rectangular, and let a vector r\ be written 
down for Y in the same way as (1). By inspection of the (t, j)th element of Y, namely, 
S hki x ki an( i of the coefficient of x kl in it, namely we see that 

V = (K'xH)£, (2) 

the transforming matrix being the direct product. In particular, if H is of order m x m, and 
K of order nxn, the Jacobian of the transformation (2), by a well-known result, is equal to 
\H\ n \K\ m . 

Next consider X = X', of order nxn, congruently transformed to Y = H'XH. Let £ be 
now the vector made from the \n(n+ 1) elements in and above the diagonal of X, written 
down row after row as before; and let q be written down likewise for Y. By a similar inspec¬ 
tion of the typical element in H'XH, account being taken of the fact that k l} in (2) is now h ip 
we arrive at tho very useful result 

(3) 

The transforming matrix here is the ‘second induced’ or ‘second Schlaflian’ matrix of 11, 
sometimes called the ‘symmetrized direct square’; namely, the matrix which, when H 
transforms a vector, transforms the squares and binary products, duly ordered, of elements 
of that vector. Again, by a well-known result, the Jacobian of the transformation (3) is 
| H |»+L 

3 . The lemma of Siegel and its application 

The lemma is as follows: given S, an arbitrary positive definite real symmetric matrix, and T, 
a variable positive definite real symmetric matrix, both of order k x k. Integration being 
over the domain of positive-definiteness of T, it is asserted that* 

f exp (-tr ST) | T |= {r(£)p< fc -» *fl T(»- \h) I 8 |-» (1) 

J 0 

We follow Garding in employing the ‘triangular matrix’ transformation. First, we remark 
that S may be factorized uniquely into HH', where H is ‘positive lower triangular’ (p.l.t.), 
i.e. % > 0, x (i = 0, i <j. For it is known that a positive definite symmetric matrix S may be 
expressed as MM', where ilf is a real matrix, with an indeterminacy in respect of M, in that 
MK, where K is orthogonal, would serve as well. However, if M is l.t., as can be the case, 
since any matrix can be reduced orthogonally to triangular shape, and MK is likewise l.t., 
then K is l.t. Now the only orthogonal matrix that is l.t. is I, the unit matrix; and so the 
p.l.t. resolution of 8 is unique. A similar result holds for positive upper triangular factoriza¬ 
tion; and the extension to Hermitian matrices, in the shape HH', is evident. 

* The Editors kindly recall to me that Siegel’s lemma, in an equivalent form, is stated and used 
in Cramer’s Methods of Mathematical Statistics (1941)), pp. 390-4. 



A. C. Aitken 


61 


Now in (1) put S = IIH', R'TH = II = XX', where X is p.I.t, also. We then have 
tv (ST) = tr (HUH~ l ) = tr U = tr (XX') = sum of squares of all x i} . The Jacobian of the 
transformation U —> X is again l.t., and so is equal to the product of its diagonal elements, 
namely 2 k x$ 1 o$t 1 X3a 1 ---%kk- The Jacobian of the transformation T-* U is the reciprocal of 
| H {2] |, namely, | H | _(,s+1) = | 8 Thus the left-hand member of (1) is reduced to 

2 k J exp (— tr XX') \ S |-»+K*+u j S |-i( fc +i) x k^ x k-i x kk d£, 

= n 1 T(n-\h) | 8 1-», (2) 

7t=0 


since the integral has been resolved into the product of \k(lc + 1) integrals of familiar type. 

It will be interesting at this stage to invert the order of demonstration and to obtain by 
Siegel’s lemma the multiple Laplace transform of Wishart’s function. The integrand will be 

c exp (- tr S P ) exp { - * tr (n- 1) V | -«»-«| P | 

1 


V | -l(n-l) | V |K«-fc-2). 


= cexp -tr (n- 1) + — 


Thus the transform in question is 



( 4 ) 


since all the F-functions indicated in c cancel out; or we might merely have observed that in 
a Laplace transform, or moment-generating function (m.g.f.) of a probability function the 
constant term must bo unity. Now this m.g.f. in (4) is known otherwise (Aitken, 1931; 
Wishart & Bartlett, 1933) to be indeed the Laplace transform or m.g.f. of the fly. Hence, 
provided reciprocation is unique, we may pass from this m.g.f. to Wishart’s function by the 
use of Siegel’s lemma. This uniqueness is assured by the boundedness and continuity of the 
integrand iii any domain of the space. 


4. The moment-generating function of the estimates 

We conclude by giving a revised version of the derivation of the m.g.f. of the fly. Assigning 
s n as moment-carrying variable for v u , and 2s y for fly (i+j), we have for the m.g.f. the 
wfc-fold integral 

The quadratic form in the exponent of the integrand is \x'Qx, where Q is a positive-definite 
matrix partitioned into submatrices of order k x 7c, thus, 

T7 . 2 S 2 8 2 S 

y -r -|—-- 

n n(n—\) n(n — l) 

_p-i+M_ 

n(n— 1) n n(n — l) 


2 8 2 S 2 8 

n(rv-l) n(n- 1) n(n- 1) 




( 2 ) 




On the Wishart distribution in statistics 


We transform this to HQH~ l , where 

rl -I 

I -I 

r:r I -I 


"III 
I I 


H~ l = 


( 2ys\ 

I + — | isolated down the 

diagonal, and a last row of submatrices not involving S. So, using the standard result on the 
integral of exp (— ^x'Qx), and recalling again that in any m.g.f. the constant term is 1, we 
arrive at the determinantal value 


and now reciprocation can proceed. 


5 . Modification of Wishart’s distribution 

In the case of each variate we took the mean x { , and estimated the in terms of deviations 
from these means. In other words, we fitted constants, namely, means, by least squares to 
each set of n observations of a variate x { . But eases are encountered in which not a constant, 
but a polynomial of assigned degree, or so many harmonic terms, or some linear combination 
of arbitrary independent functions, or the like, are fitted to the variates, the v i3 - being then 
estimated from the residuals after the fitting. In an earlier paper (Aitken, 1946) we have 
examined such cases in some detail and have shown that, provided all variates are fitted to 
the same orthonormal functional basis of l independent functions, the distribution of the 
v i} has m.g.f. 

as one might indeed expect; but notice must be taken (ibid.) of strict conditions governing 
the functional representations. When these are satisfied, and the residuals are assumed to 
be a normal sample, Wishart’s distribution still holds, with n— 1 replaced by n — l. 



REFERENCES 

Aitken, A. C. (1931), Some applications of generating functions to normal frequency. Quart. J. Math, 
(Oxford Series), 2, 130. 

Aitken, A. C. (1946). On a problem in correlated errors. Proc. Boy. Soc. Edinb. 62, 273. 

GArding, Lars (1947). The solution of Cauchy’s problem for two totally hyperbolic linear differential 
equations by means of Riesz integrals. Ann. Math. 48, 785. 

Hsu, P. L. (1939). A new proof of the product-moment distribution. Proc. Oamb. Phil. Soc. 35, 336. 

Ingham, A. E. (1933). An integral which occurs in statistics. Proc. Camb. Phil. Soc. 29, 271. 

Siegel, C. L. (1935). Ueber die analytische Theorie der quadratischen Formen. Ann. Math. 36, 527. 

Wishart, J. (1928). The generalized product-moment distribution in samples from a normal popula¬ 
tion. Biomelrika, 20 A, 32. 

Wishart, J, (1948), Proofs of the distribution law of the second order moment statistics. 
Biomelrika, 35, 55. 

Wishart, J. & Bartlett, M. S. (1933). The generalized product-moment distribution in a normal 
system. Proc. Oamb. Phil. Soo. 29, 260. 




[ 63 ] 


the spectral theory of discrete stochastic processes 

By P. A. P. MORAN, Institute of Statistics, Oxford University 

Consider a stationary stochastic process defined hy a sequence {a;,} of continuous variates, 
where t = 0, + 1,Wold (1938) has proved the following fundamental theorem, which is 
the analogue for discrete processes of the theorem proved for continuous processes by 
Khintehine (1934). 

Wold’s Theorem. Let p k (Jc = 0, ± 1, ...) be an arbitrary sequence of constants. Then 
a necessary and sufficient condition that there exists a discrete stationary process with these 
as serial correlation coefficients, is that there exists a non-decreasing function a>{6) such that 
<d(0) = 0, <o(n) = 7r and 

p k = - f coskdd<o(0). (1) 

nj o 

<o(0) is then known as the integrated power spectrum of the process and is fundamental in the 
theory of generalized harmonic analysis. When <o'(0) exists, it is known as the spectral density. 

In the present paper we shall show how the study of the spectrum of a process and of a 
covariance generating function introduced by Quenouille (1947) can be used to simplify the 
theory of such processes and, in particular, to provide short proofs of the theorems of Slutzky 
(1937) and Romanovsky (1932, 1933). 

Wold shows that when (o'(6) exists in the interval (0,7r) it is given by 

(o'{6) = 1 + 2 £ p k cos kO. (2) 

i 

0)'(9) will always exist for processes generated by taking an infinite moving average of a 
completely random process, whose weights form an absolutely convergent series. The series 
used in most studies of time series are of this type, for they are either finite moving averages 
of completely random series or the solutions of stationary stochastic difference equations. 
The latter are equivalent to an infinite moving average whose weights are dominated by 
a convergent geometric series. For simplicity it is convenient in what follows to restrict our¬ 
selves to series generated by taking a (possibly infinite) moving average whose weights are 
known to be dominated by a convergent geometric series. This restriction is convenient but 
not essential. 

If we write z = e i0 in (2), we obtain a function first introduced by Quenouille which gener- 

Oo 

ates the serial correlations. This is 2 p k z k . We multiply this by thus replacing the 

— 00 

serial correlation coefficients p k by the serial covariances c k = c_ k = E{x i x {+Je ). We then have 

S{z) = £ c k z k , (3) 

— co 

which we call the covariance generating function. When the process is generated by a moving 
average whose weights are dominated by a convergent geometric series, the series (3) will also 
be dominated by a convergent geometric series and will be a Laurent series convergent in an 
annulus 1 - $ ^ | z | < 1 + £. The coefficients c k will therefore be uniquely determined by S(z). 



64 The spectral theory of discrete stochastic processes 

It is convenient to call such processes ‘ Laurent processes ’, In the more general case where the 
weights and therefore the p k are only known to form an absolutely convergent series, we have 
to use theorems on the uniqueness of Fourier series. Even in this case it is algebraically more 
convenient to write ( 3 ) as a Laurent series, although we may only know that it converges on 
|«| = 1 . We also note that S{e i0 ) = c o a)'(0). 

Now suppose we have such a process {x t } with serial covariances c k and covariance generat¬ 
ing function S(z). Define a new process {y t } by 

00 

Vi = £ «(*/-<» (4) 

1=0 
oo 

where £ a, is dominated by a convergent geometric series. Then it is not difficult to prove 
1=0 

that (4) is convergent with probability one and defines a stationary Laurent process. Write 
h = c~k = E (y t y t +k) 

CO CO 

= S S 

i=o j=0 

Then S(z) = £ c k z k 


= S £ E *jCk-n-i zk 

-oo i=0 j=0 

= | ( ) .? n a j 2 ')' S '(*)- ( 5 ) 

This is the fundamental formula in what follows and shows the effect on the covariance 
generating function of taking a moving average of the process. 


Application to stochastic difference equations 
Quenouille (1947) has used this type of result to discuss the solutions of stochastic difference 

co 

equations and in particular to calculate 2 PkPk\ i> which is useful in calculating the co- 

k= — co 

variance between two sample serial correlation coefficients of different orders. Thus if we 
have adequate ( 6 ) 

where {ip} is a completely random process and the equation 

z k + a 1 z k ~ 1 +...+a k = 0 (7) 

has all its roots inside the circle | * | = 1, we can see that {x t } is a Laurent process. For multi¬ 
plying ( 6 ) by x,_ s (s > 0 ) and taking expectations we get 

c s + ®l c s-l+ ••■+ a k c s-k = °> 

and the condition on (7) implies that the solutions of this are dominated by the terms of 
a convergent geometric series. If S x (z) is the covariance generating function of we then 
have, since the e.g.f. of (%} is 1 , 

1 = (1 + 042 + ...+a k z k ){\ + a 1 z~ 1 + ...+a. k z~ k )S x (z), 
and so BJz) = [(1 + 043 + ... +a k z k ) (1 +a 1 z~ 1 + ... +a fc sr*)]-i 

and c 0 (i) (6) = [{1 + af +... +a k ) + 2(a 1 + a- l a i + ... +a k _ 1 a k ) cosd-f ... + 2 a fc cos kd~[~ 1 . 



P. A. P. Moran 


65 


We notice that, if {ik,} is a process generated by the above relation ( 6 ), then 

£(= *( + ai* /+ i+... +a k x t+k 

has a c.g.f. equal to unity and zero serial covariances and so is completely random if the 
process {x t } is Gaussian. This result is otherwise obvious, but is of interest in showing that 
processes of this type can be regarded as completely reversible. 

The above results are also of interest in throwing some light on what happens when we 
define a process by means of an equation like ( 6 ) but drop the condition that the {%} are 
random. If we know the c.g.f. of the {?/,} process we can find that of the {a; ( } immediately by 
using (5). This is of value in studying multivariate processes, and Quenouille has suggested 
that these also be studied by using generating functions with several parameters instead of 
a single z. It is probably more illuminating, however, to use matrix notation and a. single 
parameter. Since, apart from an investigation on sampling properties by Mann & Wald 
(1943), the theory of multivariate stochastic difference equations has not been set out in 
detail, we shall do so in the last section of this paper. 


Slutzky’s theorem 

Equation (5) enables us to give much shorter proofs of Slutzky’s sinusoidal limit theorem 
(1937) and its generalizations (Romanovsky. 1932, 1933). Consider a completely random 
series {x^f with finite variance and perform the operation of taking a moving sum of two, 
x’P = % t n times. Then take the )uth difference of the resulting series. If wo take a 
finite section X v ..., X tl of the resulting series, this will differ from a sine wave with more than 
any given relative error with a probability that tends to zero when n tends to infinity and 
mn~ l tends to a constant A such that 0 < A < 1 . The period L of the sine wave will be given by 

cos 2 ?( 8 ) 
1 -j- A 


The c.g.f. of a completely random series is unity, and so Slutzky’s series will be a Laurent 
process whose c.g.f. is 

S(z) — (1 — z -1 )" l (l — z)" l (H-z _1 )' ! (l +z) n , 
and putting z = e ie we get 


Then 


Therefore 


c 0 0 )'( 6 ) = S(z) = 2 2 (m+n >sin 2 " 1 (iff) cos 2il (10) 


_ 2 m +> * (l - cos 0) m (1 + cos 6) n . 

c„ = <o'(8)M 

J 0 

= 2 2(m+,,) — f sin 2m (id) cos 2 ’ 1 (iff) dff 
zr J o 


2 a(nt+,t ) F (m+j) T(h+ j) 

-7rr(m + n+ I) 


*>'(0) 


7rr(m + 7H-l) 

-' •--—Sill 

r(m+£)r(?z+£r 

^ 2 -w-nf (m+n + 1) 
r(m + i) T(n, + i) 


2m 


(i 


(iff) cos 2n (iff) 

— cos ff) m (1 4- cos ff) n . 


( 9 ) 


Biometrika 36 


5 



QQ The spectral theory of discrete stochastic processes 

Slutzky's result arrives from the fact that n>'(0) has a single peak in the region of which most of 
the spectral density is concentrated, and so u>(6) tends to a step function. The peak occurs at 
the point 8 0 where at" (8) = 0 . Thus 

n — in 1 - A 

COS un = ^ ~ \ • 

0 n+m 1 +A 

To prove that o){6) tends to a step function we show that 

lim co'ld ) = 0 if (9 +cos -1 —-—, 


n—* oo 
mri" 1 -* A 


= oo if 6 = eos -1 - 


Consider the asymptotic behaviour of (9) for m and n large. For large x, T(a;) behaves like 
(2n)i e~ x x x ~*, and so o)'{6) is asymptotically equal to 


}-m-n Q-m-n -1 q. w j^m+n+J 
g-m-1 4. £)tn e -n--t ( n q_ 


(1 - COS 0) m (l + COS0)’ 1 


o-in-tt (rn. 4-)), 4- 1 yn+a +4 

= (bn)*- -; —;-—tt -(l-COS#) m (l + COS0) m . 

Now put cos d =———+p, w here-—-<p<——. Then when p - 0 , oj'(9) is asymptotically 

7b ”1" Tib tH 71 7Tb 4“ 7b 

equal to (^zr)l(m+n+ 1 ) J , 

which tends to infinity as n increases and mn~ l tends to A. If p 4 = 0 , o/(8) is asymptotically 


equal to 


(wu«« + i4-^]*j. + ^r, 


and this may be verified to tend to zero as n tends to infinity and toti -1 to A, uniformly in any 
closed interval excluding 6 0 = cos _ 1 -j—It follows that <o(0) tends to a function with a 
single step at j ^ 

of magnitude tt and p s tends to coss# 0 . 

We now show that this implies that a given stretch X u ..., X N of the final series tends, in 
probability, to a sine wave as n increases, mn~ l tends to A, and N is kept fixed. For consider 

T = N ^\x in -2 f>1 X, n + X.f. 

1—1 

Then E{T) = 2(iV-2)c 0 (l -2pf+p 2 ). 

As n increases, cjf 1 E(T) tends to 

2 (N - 2) (1 - 2 cos 2 0„ + cos 2 (J 0 ) = 0. 

Given e small we can choose n so large that 

Ac 0 - l A(7’)<e 3 , 

an( f Pr. {N T > c 0 e 2 } < e, 


Pr. 2 [A" £( . ls -2p 1 X m + A' i | >eje <e. 

I 1 = 1 


and 



P. A. P. Moran 


67 


Thus Aj +2 2 Pl X M + X t — 0 

is approximately satisfied for i = and the probability that the sequence 

X v ..., X N will differ from a sine wave by more than a small relative error will be small for the 
series {XJ, for i = 1, N can then be regarded as the sum of a ‘complementary function’, 
which is a sine wave, and a ‘particular integral’ which will, in general, be small with e. 

Romanovsky (1932, 1933) has generalized Slutzky’s theorem in various ways. If we take 
a moving average of s( > 2) terms and repeat this n times, differencing the results m times we 
again get a sinusoidal limit theorem and S(z) will be given by 

S(z) = (1 -(l - (l-z~ s ) K (l-z s )>\ 

This simplifies a good deal of Romanovsky’s proof, but the discussion of the limiting be¬ 
haviour of (o'{8) is then not so simple as above, and it is necessary to use Romanovsky’s 
lemma I (1932, p. 85) to complete the proof. In a second paper Romanovsky proves that the 
result still holds when the process of averaging and differencing is applied to a series which is 
not random provided that 

2 Pk cos kd 

i 

is uniformly convergent in (0, n) and 

1 h 2 2 cos k6 0 4= 0, (10) 

i 

where 6 n is the value of 8 at which oj(6) tends to have a step in the completely random case, as 
m and n increase. This result follows at once from the above discussion, the operator 

(1 - (1 - z)"*-« (1 - a -»)» (1 - z») n 

being applied to the e.g.f. of the x’s and the condition (10) expressing the fact that S(z) must 

CO 

not have a zero at z — e <9 », the result being true even when £ p k is only known to be abso¬ 
lutely convergent. 0 

The basic reason for the truth of such theorems as these is worth a little attention. The 
effect of repeating a moving average with positive weights is to generate a longer moving 
average whose weights can be approximated by the ordinates of a normal distribution. The 
effect on this of taking the mth difference is to generate a moving average whose weights are 
approximated to by the ordinates of the mth derivative of the normal distribution, that is, 
the mth tetrachoric function. These ordinates themselves mimic the oscillations of a sine 
wave,* and thus Slutzky’s operations have resulted in a moving average with weights 
oscillating with a period equal to that of tho resulting nearly sinusoidal process. 

By the use of another parameter w as well as z it is then trivially easy to generalize this type 
of theorem to the case where we have a set of random variables {ajy} (i,j = ..., —1,0,1,...) 
arranged in a lattice. If we take repeated moving averages of tho form 

= Xjj + T XjJ-1 + 

and then repeated differences of the form 

we arrive at a process which mimics the product of two sine waves, one along the i axis and 
one along the j axis. In this way approximate solutions of partial differential equations in 
two or more dimensions can be built up out of random elements, but such solutions do not 
satisfy prescribed boundary conditions and seem to be relatively trivial. 

* See, for example, Szego (1939, p. 194, formula (8-22-8)). 



68 The spectral theory of discrete stochastic processes 

Theory of multivariate stochastic processes 
Consider a stationary ^-dimensional process defined by a random column vector 

x ,(t = 0, ± 1,...) 

with p components and a transpose 

x',= (x't . xf). 

For given s( = 0, ± 1,...) we define the matrix 

(c] 1 ... cf \ 


(cl k ) = eixX-J = («#.)' = ■: 

W ... cr 


where cf = E(x{xi +a ) = c%. 

We write = cr] and pf = —-. 

Then Cramer (1940) has shown that there exist p 2 (possibly complex) functions (o jk (d) 
defined for —n^d^ n, which are of bounded variation and such that 

4 k = ^j e ~ ue do) jk (G). 

For j = k, o) jlc {6) is real and non-decreasing. From this it then follows, as before, that if the 

oo 

series £ cf is absolutely convergent, <u' jk (d) exists and is given by 

f =a — 00 

<4(0) - i 4 fe e«. 


We write z = e ie and put S jk (z) = £ cfz s , 

- 00 

and we define the matrix covariance generating function 

S(z) = ( £ cfz^ . 

In all the applications with which we shall be concerned the series in each element of this 
matrix will be convergent in an annulus l-d< |zj < 1 + d. The matrix is therefore an 
analytic matrix function of z in such an annulus. 

Now define another vector process {y,} as a moving matrix average of the x’s by the 
er ^ uatl0n Y( — A 0 x ( +A 1 x i _ 1 +..., 

where each A £ is a p x p matrix. It is then not difficult to show, by Kolmogoroff’s theorem on 
the probability of convergence of infinite random series, that {y} will be a well-defined random 
process if each of the p 2 series formed by the (i,j?’)th element of the A’s is absolutely conver¬ 
gent. Write (c f) for the serial covariance matrix of order s of the y’s. Then 

(If) = E( y,y; +8 ) 

= ^((j£ X( +s _ n A;jJ 

= S A m x ( _ m x' (+a _ ft A;] 

lm=0 n&O / 

= £ £ A m (cj+ m _ m ) A n . 

w=*0 n= 0 



P. A. P. Moran 


69 


If we denote the matrix c.g.f. of the process by S„(z), we have 

S„(z) = £ (cf)* 8 

— CO 

= s s i: 

~oo to“ 0 n-0 

- ( 5 a m2 -4s( Z )( £ a;A (ii) 

\m=l) J \n,= 0 / 

Tliis is the fundamental result for vector processes corresponding to (5), and by its aid we can 
now discuss the solution of vector stochastic difference equations. It may be compared with 
the corresponding result for continuous process (Bartlett, 1947, eqn. (7), p. 92). 

Consider a vector stochastic difference equation 

x,+A 1 x,_ 1 + ...+A*x,_ fc = Y),. (12) 


We suppose that r\ t is a column vector of rf s such that their variance-covariance matrix is 
B = (U and that the up for different values of t are independent. Then the matrix spectral 
generating function of {yj,} is also B. 

To ensure that the {x ( } process is stationary, we must impose the condition that the roots of 


l + £ a £ z< 


= 0 , 


(13) 


I i-1 I 

which is an equation of degree pic at most, all lie outside the circle | as | = 1. We now show that 

00 00 

this condition implies that for each pair (j, k) the series 2 4 k > S c~a are dominated by a 

ti u 

convergent geometric series. To do this we first obtain a difference equation for each 

4 k ( = c% if j + k) 


which does riot involve the other c’s. If D is an operator which lowers by unity the suffix t of 
any term which it pre- or post-multiplies we can represent the matrix difference equation 

( 12 ) by i * \ 

(l+ .2 A,2*1 x ( = Y)j. 

This is algebraically equivalent to p difference equations. Apply the signed operator co- 
factors of the ith column of the operator matrix | l + £ A ; D f j to each of these p equations 
in succession and add. Denote these cofactors by 0((£>),..., O l p (D). We then obtain 


1 4" 


£ A t D* 


i= 1 


xj^0[(D)7 l j + ... + 0i(D) V f. 


(14) 


This is a stochastic difference equation for xj (for fixed i ) whose right-hand side is a finite 
moving average of y’s. The condition on the roots of (13) ensures that the process fa:)} which it 
generates will be stationary. The highest power of D occurring on the right-hand side is not 
greater than k(p- 1), and so in (14) no ij occurs with a suffix earlier than t~k{p~ 1). If, 
therefore, we multiply (14) by x{_ s , where a > k(p— 1), and take the expectation we get 


1 + S A<2* 


i=l 




70 


Tk spectral theory of discrete stochastic processes 

CO 

and the condition on the roots of (13) shows that the series £ cf is dominated by a con- 

H 

CO CO 

vergent geometric series. We obtain a similar result for the series l J cf which will 

“ ' rl) M) 

also be dominated by a convergent geometric series. 

It follows that the matrix covariance generating function of the process {x,} is convergent 

in an annulus 1 - H 1 1 | < 1+5 and is given by 

h \-i / k \-i 

I + S A f r f B,1+ va; 2 i , 

i i' a l \ 1*1 / 

and from this all the serial correlations and the individual power spectra can be calculated. 

REFERENCES 

Bartlett, M. S. (1947). Stochastic Processes, Lectures given at North Carolina. 

Cramer, H. (1940). Ann. AM., Princeton, etc., 41,215. 

Khintchine, A. (1934). AM, Atm, 109,004, 

Maun, H, B. & Wald, A, (1943), flmoMtrk, 11,173. 

Qmwoms, M, H. (1947), Bmetrib, 34,395, 

Romanovsey, V. (1932). H\ AM Palermo, 56, i. 

Romanovsky, V. (1933). R.C, AM Palmar, 57,130, 

Sltoky, E. (1937). Emomtricu, 5,107, 

Szeqo, G. (1939), Orthogonal Polynomials , New York, 

Wolp, H. (1938). A Study in the Analysis of Stationary Time Series, Uppsala, 



ON A PROPERTY OP DISTRIBUTIONS ADMITTING 
SUFFICIENT STATISTICS 

By V. S. HUZURBAZAR. Fitzioilliam House , Cambridge 


Summary. A property of distributions admitting sufficient statistics is obtained, con¬ 
necting the likelihood function of a sample of n observations, the maximum likelihood 
estimates of the parameters and the information matrix. A geometric meaning of the property 
is given. The property is used in simplifying the calculations of the variances and covariances 
of the maximum likelihood estimates in large samples. Finally, it is shown in virtue of the 
property that the likelihood equations have a unique solution for every sample of any size, 
and that the solution does make the likelihood function a maximum. 

1. Let/(:r, 6 V 0 2 ,..., 8 p ) be the probability density function of a distribution depending on 
p parameters. For brevity we shall write (0.-) for (B lt 0 2 ,..., 0 p ) as an argument of a function. 
For simplicity we shall confine ourselves to univariate distributions, but the analysis is 
exactly the same for multivariate distributions. For multivariate distributions of q random 
variables, (a) is simply replaced by (aq, aq, ...,x a ). If x v x 2 , ..., a; n is a sample of n independent 

n 

observations from the distribution we shall call, for convenience, L = 2 log/(**, OA as the 

i=l 

likelihood function of the sample. 

It has been shown, under general regularity conditions, by Koopman (1936) that the 
necessary and sufficient condition that a distribution depending on p parameters should 
admit a set of p jointly sufficient statistics is that the probability density function of the 
distribution be of the form 

= exp v k (x) + A(x) + (1) 

where u k ’s and B are functions of 0/s only and v k ’& and A are functions of x only. 

Let now/(a:, 0/ be of the form (1). We have 


A = S log/fo, 0/ 


= S u k (dj) S ®jk(*i)+ S A{x. i )+nB(6 j ) 

k=l 1=1 i=l 


= S u k{°j) T k + S A(x i ) + nB(d i ) t 

l i=i 


where T k = £ v k (Xi), 

i =1 

dL * du kr r dB . 

j 5 i w r Tk+n w r (r_1, 

Let 6 1( 6 2 ,..., 6 p be a solution of the system of 1 likelihood equations ’ 


( 2 ) 



72 On a property of distributions admitting sufficient statistics 

so that 0 3 ’s are the maximum likelihood estimates of dfs. Then 

£ 


k = 


' l ^ Uk \ rp i n /^?\ _ o 

iWt,* wl ’ 


where by the notation (du k jd6 r )^ we mean that the derivative is evaluated at (6^). 
Since E(dL/d0,.) = O, we have 

( '-'' 2 . H 


d s L » dH k d-R 


dd r dti~ h^ r d6^ v, ^dtf r w s 


■ Tu + n- 


( r,s = 1,2 


Hence 

and 


E [dti r do] hxW r w s E{ k) do r dd s 


\8W», * WW 

The r p simultaneous linear equations given by (4) enable us to evaluate 

E(T k ) (fc =1,2 

Substituting for E(T k ) in (6) we shall have 

9 ®£ 


(3) 

(4) 
( 6 ) 
( 6 ) 
(7) 


E 


w r do s 


) = m 


say. 


( 8 ) 


The p simultaneous linear equations given by (3) also enable us to express T k ’s in terms of 
0/s, whence we can substitute for T k s in (7). But it is interesting to observe that the pair of 
equations (3) and (7) are exactly of the same form as the pair of equations (4) and (6). In fact, 

d 2 L 

we can obtain the former pair from the latter by writing T k for E(T k ), 0,- for 6., and for 

m T o(f t 

( ^2\ 

Hence comparing noth (8), the result of substitution for T ft ’s in (7) must be 

( 00 , 80 }*, = (9) 

From (8) and (9) we have the property 

3«).-W 


d*L 

Wr^sDi, 


( 10 ) 


where by the notation on the right-hand side it is implied that $■ is replaced by 0 ,- after the 
expectation is evaluated. 

Similar argument leads to the generalization of (10) as 


1 dmL \ i 

J 




•» 


(H) 


where m = m l + m 2 + m 3 + .... Setting r, s = 1,2,..., p, (10) may be written in the matrix form 


KaHMSIJ 


It is convenient to write (12) as 

[(~0p0 s )J = [H~3p^)) J’ 

I is Fisher’s ‘information matrix 


( 12 ) 


(13) 


where 


[«}> 



V. S. Huztjkbazae 


73 


2. Geometrical significance. It is possible to give a geometrical meaning to the relation (13). 
We shall, however, confine ourselves to the case of a single parameter 6. 


We have 


3 2 jL ' 

3 0 2 1 


- 


W-S)}.- 


m. 


(14) 


where I(9) is the information function of a sample of n observations. I(6) is also known as the 
‘intrinsic accuracy’ of the distribution (Fisher, 1925). Since 1(6) is essentially non-negative, 
1(0) is non-negative, whence from (14) (— d i LjdQ 2 ) i is non-negative. The curve represented 
by the likelihood function of a sample may be called the ‘ likelihood curve ’. If p is the radius 
of curvature of the likelihood curve at 6 = 0, 


1 

(- 

d*L\ 

dO 2 ) 

L 

M 

i-H 


O’ 


/ d 2 L\ 

! 

\ WVt 

, since 1 


(dL\ r. 


Hence 


- p - /w. 


(15) 


The information function (or the intrinsic accuracy) therefore measures the curvature of the 
likelihood curve at the point represented by the maximum likelihood estimate. 

For large samples we have another interpretation. In large samples the variance of 0 is 

-i- and the estimate of the variance of 0 is-,-)*—. Hence the radius of curvature of the likeli- 

1(6) 4(0) 

hood curve at the maximum likelihood estimate gives an estimate of the variance of the 
maximum likelihood estimate in large samples. 

3. The variances and covariances of the maximum likelihood estimates in large samples. 
If instead of replacing 6 } by 6 j we replace 6 ; by we shall have in place of (13), 


(16) 


(17) 


Taking inverses, when the matrices are non-singular, 

[(-^Lr=w-iw)r- 

The matrix on the right-hand side of (17) is the variance matrix whose elements give the 

variances and covariances of the maximum likelihood estimates in large samples. But the 

( 0 \ 

— )• The 

du r vv s J 

variances and covariances can be simply obtained from the matrix on the left-hand side of (17). 

As an example (cf. Kendall, 1946, p. 37) consider the estimation of the five parameters of 
the bivariate normal distribution 


dF = 


2ncr l a i ^l(\-p 2 ) 


exp 


- 


1 )(* —a) 2 

2(1 —/o 2 ) l o'? 


. (y-fi) + (y-PY X 

oyo-,, cr| 


1 dxdy. 


The maximum likelihood estimates are 


<* = *> P = y, ef = is(^-^) 2 , o| = ^Z(y-y) 2 , P o x<*a = ^S(*-*)(y-»). 


n 


L= -n log [Zncr^jH-p 3 )]- 


2(1 ~P 2 ) 



74 


On a property of distributions admitting sufficient statistics 

( P)o r 

N ° W 02 p n n (3of 2ppo 1 a z 3(«-a) a 2p(«-g) (ft-/?) ) 

~dal = ~oi r l-p t \crl cr?er, cr? ofcr^ 


Hence 


„ 5 + 
crJo- a 

3 2 L\ 


*(-SH-SL 




n n 
» + 


3 2p 2 

crf^ 1 — p 2 \cr‘i al 
»(2—p a ) 


cr?(l-p 2 )‘ 

4. Maxima of the likelihood function. Let 6^, (j = 1,2, ...,p) be a solution of the system o: 
likelihood equations. The information matrix 

is essentially non-negative for all Ofs, since it is a matrix of the variances and covariances of 

—log/(sc, Of),r = 1,2, ...,p. Moreover, if for a certain set of values of Of s any of the principal 
3 ’ 3 


d 9 

minors of the matrix | E\— log/(a', Qf) log/(a:, Of) 


*[ J 


3 0 r 


vanishes then the functions 


ggplog/Mj) (r= 1,2 . p) 

will he linearly dependent for that set of values of Of 3. We shall, however, exclude suoh 

9 

exceptional cases and assume that the functions log/(a;, Of) are linearly independent for all 
sets of values of 0f & so that the information matrix E ( — - [ l is positive definite for 

*11 * > a L V MfWJl 


all Of s. 
Hence 


g|~-)| J is positive definite, and, in virtue of (13) it follows that the 


matrix I -ww-ww) is positive definite. The matrix ■) 

L\ °Or W shiA L\ 9ff r 56 V&jJ 


is negative definite and L 


is therefore maximum at = By 


J J 

It also follows that a solution of the system of likelihood equations is unique. For by what 
we have just shown every solution of the likelihood equations gives a stationary maximum. 
If there were two or more distinct solutions all would be stationary maxima, and between two 
stationary maxima we should have a stationary minimum, under regularity conditions. But 
there is no solution which gives a minimum. Hence the system of likelihood equations have 
a unique solution at which the likelihood function is maximum for every sample of any size. 
In a recent paper (Huzurbazar, 1948) I have discussed the maxima of the likelihood function 
of a sample from any distribution depending on a single parameter. 


REFERENCES 

Fishee, R, A. (1926). Proc. Camb. Phil. Soe. 22, 700-25. 

Huzurbazae, V. S. (1948). Ann. Eugen., Lond., 14, 185-200. 

Kendall, M. G. (1946). The Advanced Theory oj Statistics, 2. London. 
Koopmax, B. 0. (1936). Trans. Amer. Math. Soc , 39, 399-409. 



[ 75 ] 


ON A METHOD OF TREND ELIMINATION 
By M. H. QUENOUILLE 
1. Introduction 

The problem of trend elimination is familiar in biology and in economies. Three main methods 
exist at present for the purpose of eliminating trend. First, there is the method of block 
elimination by which the data are broken into groups and the difference between groups is 
eliminated. This has the disadvantage that when the trend is marked it is only partially 
eliminated, and a second method whereby a curve is fitted to the observations may be used 
(e.g. Fisher, 1924). This method, although it allows the effect of the trend elimination to be 
estimated, can be criticized on the grounds that it is seldom possible to represent a trend by 
an algebraic curve of low degree, and the residuals will be correlated if the algebraic repre¬ 
sentation is inadequate and will lead in consequence to spurious correlations. In any case, 
whereas there may be little doubt about the existence of an ‘ideal’ curve, this curve may not 
be readily realized in practice. Such is often true with growth curves where a deviation at an 
early stage is reflected throughout the subsequent observations, or where the curve is in 
reality discontinuous as with the sudden pause in the rate of growth at puberty or the 
feeding-stuff's requirements of dairy cattle. This difficulty has led to the third method of trend 
elimination, namely, the method of moving averages. By this method, curves are fitted to 
sets of points and used to find the deviation for the central point of each set. Thus in effect 

i+m 

2 («,-Oo- ...~a p fp) 2 

(=i—m 

is minimized and a f) is used instead of u t . This process is repeated to find the ‘smoothed’ 
values for the other observations, but in practice the method is formularized so that the 

m 

smoothed value for is obtained from a moving average £ b t u i+i , where b t is independent of 

{=-m 

i and conventionally, but not necessarily, b, = b_,. However, the effect of the correlation 
introduced into the residual series by the moving average is not easily ascertained and tests of 
significance are complicated. 

In the following paper, a compromise between curve-fitting and moving-average ap¬ 
proaches to trend elimination is suggested by which curves are fitted to portions of the series 
of observations, the observations in each portion being assumed to he equally spaced. 

The method here proposed concentrates primarily on eliminating those systematic elements 
which are generally known as trend, in order to investigate the residuals. It will not always 
provide a trend line in the sense of a smooth curve, and attention in the following is concen¬ 
trated particularly on trend lines which have a series of discontinuities in their first deri¬ 
vatives. This is not an essential of the method here described, and it is possible to ensure that 
the trend obtained is smooth, but this in my opinion can only be done at the expense of the 
residuals. If we insist upon a smooth trend line then we impose upon the independence of the 
residuals whenever this assumption is not justified in practice. From this viewpoint the dis¬ 
continuities in the fitted trend are far from being a disadvantage. This method is hardly new, * 
but it is believed that some of the methods adopted here make it worthy of greater application. 

* [See, for example, E. C. Rhodes (1921), Tracts for Computers, no. VI, Cambridge University Press. 
Ed.] 



76 


On a method of trend elimination 


2 . The form of the trend 

The simplest form of trend that might be taken is a series of straight lines fitted to successive 
sets of three points. This may be done by the method of least squares by fitting constants 

26 i, 64 + 63, 26 2 , 63 + 63, 26 g, 63 + 64, 2£q, etc., 

to represent the trend values at successive points of tlife series. This might be compared with 
the analysis of randomized blocks of two, since in this case the first differences of the trend 
values are 

6j —64 — 63, 63 — 63, 63 — 63, 63 — 64, 63 64, etc. 

Similarly, the scheme obtained by fitting constants 

sb x , (s- 1)64 + 63, (s-2)b 1 + 2b 2> (s-3)b x + 3b 2 , etc., 

might be compared with randomized blocks of s. This scheme, which corresponds to the fitting 
of straight lines to consecutive sets of s points, will be said to be of separation s. 

Again, the trend obtained by fitting a series of overlapping quadratic parabolas to succes¬ 
sive sets of four points corresponds to the set of constants 

3 fq + 6 2 , 64 + 363, 363 + 63, 6 2 + 36 g, 363-1-64, etc., 

and since the second differences are equal in pairs, this will be said to be of separation two and 
degree two. 

Schemes of any separation and degree may be generated as follows. Suppose 
(1 + <+... +t*-i) A + l = a 0 + a x t + a 2 t? + ..., 
then Ecq s 6 i+1 , ’Ea is _ 1 b i+1 , 'La ia _ i b i+1> etc., 

represent a scheme of separation s and degree d. For the (d+ 1 )th differences of this series 
involve the {d+ l)th differences of the constants a. These are generated by multiplying 

by (1 ~ i)d+1 ’ 

i.e. they are given by (l~t s ) d+1 and hence are zero within sets of s. For example, if s = 3,d = 2 

(1 + i + f 2 ) 3 = 1 + 3<+6< 2 + 7< 3 + 6< 4 + 3f + < 6 , 

and the scheme is represented by 

64 + 762+63, 663 + 363, 363 + 663, 62 + 763 + 64,.... 

The separation, it is seen, also represents the number of observations between the introduc¬ 
tion of successive constants 6 i . 

This notation does not cover all possible schemes, since ‘ hybrid ’ schemes of straight lines 
and parabolas exist, but these are of little interest. Of more practical interest are non- 
symmetrical schemes, in which fresh constants suddenly appear and slowly fade away: 

264 + 363, 64 + 463, 26 2 + 36 3 , 63 + 463,.,. 

gives an elementary example of such a scheme, but these have yet to be investigated. 

In the following section the emphasis will be on schemes of separation two, since these are 
the most efficient, but the results are applicable to schemes of all separations. 



77 


M. H. Quenouille 

3. End-ADJUSTMENTS* 

The methods of trend generation considered in the last section have all regarded the series 
as starting at some arbitrary point, and while this is perfectly justified, it is possible to carry 
out end-corrections which enable the subsequent analysis to be simplified. For example, if 
the set of constants bo + bi> 2b ^ Vf ^ 2 6 u> 6 u + 6 u+l 

are fitted to the observations x v x 2 ,x 2n+1 by the method of least squares, this amounts in 
effect to the rejection of the end-observations, since these are used to determine b 0 and 6 n+1 . 
The same result may be obtained by fitting the constants 

26 1 -t-u — c, 26^, bi-\-b 2l 26,j, 26^-ta + c, 

which minimizes 

- a - 6, - 6.)’ + 2(^s±l=S - c + 4, - b,J 

+ (x 2 - 26 x ) 2 + (x 3 ~b 1 ~ 6 2 ) 2 + ... + [x 2n - 26,J 2 . 

It will be assumed in the following sections that a, = 0, i.e. that the slopes at either end of the 
range are equal and that aq and x 2n+l are only one-half as accurate as the other observations. 
These assumptions are unlikely to be true in practice, but it is immediately obvious that the 
effect of deviations from these assumptions will tend to cancel each other and that for many 
purposes their effect will be negligible. The advantage of such assumptions lies in the fact 
that, if x[ = ^(aq+ a; 2ft+1 )> x[,x 2 ,.... x in are circularly related, the least-squares equations are 
derived by minimizing 

K -A- KY + A ~ A) 2 + ( x 3 - b i - b 2 ) 2 + ... + (x 2n - 26 „) 2 , 
and the matrix of the least-squares equations is a symmetrical circulant. 

In general, end-adjustments will be made to reduce the matrix to a circulant. For example, 
for the scheme of separation two and degree two, if 

*1 = K^Sn+l "b*i)> = i(*a«+2 "h ^®a)> 

then x{, x'o, m 3 , ..., x 2n are circularly related, provided that we assume that the variance of x[ 
and x 2 are equal to the variances of the other observations. 

4. The inversion of symmetrical circulant matrices 

Under the assumptions of the last section, the problem of solving the least-squares equations 
is that of inverting a symmetrical circulant matrix with a large number of zero elements. For 
example, the fitting of b &i + 68> 2 6 2 , etc., 

necessitates the inversion of 

A = 6 1 0 ... 0 1 of orders. 

16 1 ... 0 0 

0 1 6 ... 0 0 


0 0 0 ... 6 1 

L 1 0 0 ... 1 6 J 

* The following method of end-adjustments should be compared with the method given by Yates 
( 1948 ). 




78 


On a method of trend elimination 


The number of different non-zero elements not lying in the diagonals will be termed the 
extent e of the matrix. It will be readily seen that the extent of the matrices derived from 
schemes of separation s and degree disd- [djs], where [ ] represents the integer part, since 
this is one less than the maximum number of non-zero coefficients of the in any term, which 


from § 2 is 


rNumber of non-zero a 


H 


(d+l)(s-l) + 




d+ 1- 



Table 1 gives some values of this function. It will be seen later that the inversion of a circulant 


Table I. Extent of least-squares matrix 



matrix is equivalent to solving an equation of degree equal to the extent of the matrix, and of 
inverting a matrix of order equal to the extent of the original matrix, irrespective of the order 
n of the original matrix. Fortunately, it will be seen that we are primarily interested in 
schemes of extent less than three. 

The inverse of a symmetrical circulant matrix m ay be written in the form 


— 

o 


«2 • 

Mg 

— 

«i 

«0 

U 1 ■ 

• u 3 



Ui 

u 0 . 

■ «4 

«3 


u 3 

u 4 . 

- « 0 

"l 

K 

"Mg 

u 3 . 

. iq 

«0 _ 


and the u t will be related linearly. For example, for the matrix A 

Ui+bu i+1 + u i+2 = 0 (i = 0 , 1 , 2 , ..., 0 ]- 2 ), 

with supplementary conditions at either end of t he range for i. The principal relation between 
the iq may be solved by putting and'obtaining a set of 2e roots iq, I/ft.y,. l/y„ .... where 
1 > 2/i>ya>2/ 3 ■■■> 80 that u { = + a 2 yi +-\-.... 

This may be further simplified using the end-conditions for i > [iji] _ 2, which give 
A — i so that the equation for rq may be written 





79 


M. H. Quenouille 

where G is the first coefficient in the recurrence relation for u f (usually unity), and c lt c 2 ,... 
are constants which can be determined using the end-conditions for i small. These give the 
equations 

S CtW-Vi*) = 0 (0 ^p<e), 

= 1 (P = e), 

which are independent of n, the order of the matrix. 

e 

If we now let = lim u { = £ c^j), 

co j l 

then u i = ^i + ( u »-t + Vt-i) + (%i-i + l W;) + ---- 

Tor example, for matrix A,y x = - 3 -I-2^/2 and w £ = v /3 J( - 3 + 2f2)K The values of v ( for 
many of the more important forms of trends are given in Table 2 and also the values of 

V = v n + 2 S 

t=i 


Table 2. Values of i> i 


Separation a 


Degree d 


Extent e 



1767707 
-303301 
82038 
- 8928 
1032 
-263 
45 
-8 
1 


625000 
- 208333 
60444 
-23148 
7710 
-2572 
857 
-280 
95 
-32 
11 
-4 
1 


228918 
- 109330 
49094 
-21930 
0791 
-4371 
1952 
-871 
389 
-174 
78 
-35 
15 
-7 
3 

- 1 
1 


87340 

- 52408 
28329 

- 15024 

7938 
-4191 
2212 
-1108 
010 
- 325 
172 
- 1)1 
48 
-25 
13 
-7 
4 

-2 

1 


0-1249999 


0-0312498 


0-0078126 


0-0019530 


Vx 

Vi 


-0-1715729 


-0-3333333 


-0-4464628 

-0-0395661 


-0-5278641 

-0-1055728 









Table 2 (cont.). Values of v t 


Separation s 

3 

3 

4 

4 

Degree d 

1 

2 

1 

2 

Extent c 

1 

2 

l 

2 


x 10~ 7 

x 10- 7 

x IO- 7 

10- 8 

Vo 

t»i 

Vo 

H 

Vo 

v-, 

v i 

v % 

t'ro 

»ll 

u 12 

t’l3 

! V> 

580259 
- 128115 
28286 
-6245 

1379 

-304 

67 
- 15 

3 

- 1 

99463 

-40561 

15853 

-6182 

2411 

-940 

367 

-143 

56 

-22 

9 
— 3 

1 

255155 

-61341 

14747 

-3545 

852 

-205 

49 
- 12 

3 

- 1 

253225 
- 109747 
45027 
- 18401 

7518 
- 3071 

1255 
- 513 

209 
— 86 

35 

-14 

6 

— 2 

1 

V 

<3-0370369 

0-0041155 

0-0156249 

0-00097659 

Vx 

y% 

-0-2207890 

-0-3899122 

-0-0212657 

-0-2404082 

- 0-4085474 
-0-0301991 


aobsoi 

5 

«> 

li 

6 

Degree d 

1 


1 

2 

Extent e 

1 

2 

1 

O 


X 19-8 

x 10 8 

X JU 8 

x 10 0 

l V> 

1333333 

86767 

780488 

350881 

l 'l 

- 333333 

- 38162 

- 1'10304 

- 158322 

V 2 

83333 

16089 

60894 

<17065 

Vo 

- 20833 

- 6667 

- J2096 

- 28257 


5208 

2779 

331!) 

11900 

»•« 

- 1302 

- 1158 

- 8-17 

-6011 

Vo 

326 

483 

216 

21 10 


-81 

-201 

— 55 

-889 


20 

84 

14 

374 

Vo 

-6 

-35 

-4 

- 158 

”10 

1 

15 

1 

66 

»n 

— 

— 6 

-- 

-28 

»i> 

— 

3 

— 

12 

y 13 

— 

- 1 

— 

- 5 

v lt 

— 

— 

_ 

2 

«« 

— 

— 

— 

-1 

V 

0-00800001 

0-00032003 

0-00462964 

0-000128597 

Vx 

-0-2500000 

-0-4167796 

-0-2553580 

— 0-42L11965 

j 


-0-0346458 

— 

-0-03716109 







M. H. Qtjenouille 
Table 2 (cont.). Values of v i 


81 


Separation s 

7 

8 

8 

9 

Degree d 

1 

1 

2 

1 

Dxtent e 

1 

1 

2 

1 


X 

1— « 

0 

r 

00 

x 10- 8 

x 10-“ 

x 10- 8 

Vq 

494971 

333126 

84767 

234713 

v i 

- 128020 

-86877 

- 38765 

-61660 

v 2 

33111 

22657 

16596 

16146 

v 3 

-8564 

- 5909 

-7063 

- 4235 

v l 

2215 

1541 

3004 

1111 

Vq 

-573 

-402 

-1278 

-291 

v e 

148 

105 

544 

76 

V, 

-38 

-27 

-231 

-20 

U 8 

10 

7 

98 

5 

«0 

-3 

-2 

-42 

- 1 


1 

— 

18 

— 

v n 

— 

— 

-8 

— 

v ls 

— 

— 

3 

— 

Vn 

— 

— 

-1 

— 

V 

0-00291545 

0-00195312 

0-000030017 

0-00137175 

2/i 

-0-2680413 

-0-2607940 

-0-42033078 

-0-2622800 

Vi 

— 

— 

- 0-03970799 

— 


Separation s 

10 

12 

16 

20 

Degree d 

1 

1 

1 

1 


1 

1 

1 

1 


X 

p 

oc 

X 10-8 

x 10-" 

x 10- 8 


171499 

99546 

42122 

21597 


-45164 

- 26354 

- 11211 

-5762 


11894 

0977 

2984 

1537 

Va 

-3132 

-1847 

-794 

-410 

v t 

825 

489 

211 

109 

v s 

-217 

-129 

-56 

-29 

V* 

57 

34 

15 

8 

Vi 

-15 

-9 

-4 

-2 

Vs 

4 

2 

1 

1 

V » 

-1 

-1 

— 

— 

V 

0-00099999 

0-00057870 

0-00024414 

0-00012501 

20 

-0-2633479 

-0-2647455 

-0-2661424 

-0-2667914 


Biometrika 36 


6 










g 2 On a, method of trend elimination 

which will be used in subsequent applications of this theory. Thus, for example, if parabolas 
of separation two were fitted to twenty observations, two observations would be used in 
making the end-corrections, and hence d = 2, s = 2, n ~ 9, e = 1, and the inverse matrix 
contains five elements, which are given by 

u 0 : 0-0625000-2x0-0000032 = 0-0624936 

Ul - -0-0208333+ 0-0000095+ 0-0000011 = -0-0208227 
u 2 : 0-0069444-0-0000286-0-0000004 = 0-0069154 

u 3 : -0-0023148 + 0-0000857 + 0-0000001 = -0-0022290 
« 4 : 0-0007716-0-0002572 = 0-0005144 

Similarly, for eighteen observations, 

u 0 : 0-0625000 + 2x0-0000095 = 0-0625190 

Ul : — 0-0208333 — 0-0000286 - 0-0000032 = - 0-0208651 
u 2 : 0-0069444 + 0-0000857+0-0000011 = 0-0070312 

m 3 : -0-0023148-0-0002572-0-0000004 = -0-0025724 
u { : 2x0-0007716 + 2x0-0000001 = 0-0015434 


5. Determination of the form of trenp 

Given a set of observations it will be necessary to decide the degree and separation of the 
appropriate form of trend. The separation which might be compared with the number of 
observations in each block of a randomized block design will depend largely upon tire form of 
the observations and the tests that are subsequently to be made. Although trends of separa¬ 
tion two are in general the most efficient forms, if the number of observations is small the 
number of degrees of freedom may necessitate the use of a trend of higher separation. Also if 
it is desired to test or eliminate a cyclic effect then the separation may be taken equal to the 
period so that the trend and cyclic effect are orthogonal. The degree of the appropriate trend 
forms a larger problem, and it is necessary to investigate these forms further to decide upon 
the appropriate degree. However, it is always possible to fit forms of increasingly higher 
degree until no further improvement is obtained. 

If n observations are taken, then n comparisons rnay be made of which n — 1 are inde¬ 
pendent. Thus x 1 ~x 2 , x 2 ~x v ..., x n _ x -x n , x n -x 1 form n such comparisons. These can be 
combined to give an estimate 

2 2 K--a- w ) 2 /w(w-l) 

i~ 1 3 = 1 


of the error variance, which under normal-law theory will be the most efficient. If, however, 
a trend is present, then comparisons such as x 1 ~x i = aq — ,r 2 + x 2 — x 3 + — aq will tend to 
increase the estimate of error variance so that it is necessary effectively to rule out such com¬ 
parisons in favour of the more precise comparisons ai f — x i+1 . This is done by omitting every 
sth comparison x i — a: i+1 and combining the remainder in the most efficient manner, thus 
giving the randomized block analysis, For example, randomized blocks of two correspond to 
the comparisons 


#1 ^6—©to., 


while randomized blocks of three correspond to the comparisons 


x 2 ~~ x 3 ’ etc.. 



M. H. Quenotjille 


83 


which are equivalent to 

aq-*3> *i-2ar 2 + * 3 , x 4 — x 6 , as 4 — 2z s + % s , etc. 

Similarly, the comparisons 

~ 2a?2 + a<j, a g x 4 , . • •, 2<iq -j■ 

can be combined to give an estimate of error variance, but if a trend is present this may be 
improved by the omission of every sth comparison. The analysis then corresponds to that of 
a trend of degree one and separation s. Similarly, the comparisons 

aq — 3:r a + 3a: 3 — x 4 , x 2 — ;te 3 -(- ;i.r 4 — x 6 , etc., 

can be used to form trends of degree two, etc. 

The main consequence of this is that the appropriate, degree of suitable schemes can be 
found exactly for schemes of separation two and with a high degree of accuracy for schemes of 
higher separation by the variate-difference method, and that the residual variance for 
schemes of separation can be directly estimated using the variate-difference method. If it is 
desired to test another effect simultaneously then this should be eliminated before using the 
variate differences. It should perhaps bo noted that the variance-difference method is in fact 
more appropriate for this purpose than for the general problem of fitting a polynomial to a set 
of observations. Thus it will he seen from 'fable S that because of a small rounding-off error 
we should be led to suppose that a cubic curve could represent the normal curve from 
x = -3 to x = 3. Similarly for the trend n, n— 1, 2, I, 0, 1, 2, ..., w, the first five variate 

differences are 2 I 12 10 

l> 3(2»-lj’ 5(n—f)’ 35(2ra^3)’ 

The ratios of these tend rapidly to unity, so that a random element would completely obscure 
any slight difference and we should he led to suppose that a straight line or possibly a parabola 
would adequately represent the data. 

6. Advantages and disadvantages on the method- 

This method of trend elimination would seem to be most useful for cross-correlations which 
can be evaluated by the analysis of covariance. Its application, however, to serial correlations 
is both complicated and rather dubious in nature, since successive residuals will not be inde¬ 
pendent. 

Its application to stochastic trends will in general not load to any spurious results, but it 
will cause an unnecessary loss of information. For example, consider the simplest caso, 
namely, the cumulative sum. Obviously tho employment of first differences will eliminate 
the stochastic trend and leave uncorrelated variables, so that the use of randomized blocks 
will lead to unbiased results hut will unnecessarily reject one-half of the observations. The 
use of randomized blocks of a higher separation well not reject as many observations but will 
assume that successive differences are negatively correlated. Thus, for example, if e t and e (+1 
are successive differences in randomized blocks of three, e t — e i+J and e t + e i+1 will be used as if 
they were independent with relative variances 3:1. This will in general lead to unbiased 
estimates of cross-correlations, although information will again be lost as a result of the 
incorrect weighting of these comparisons. 

The method might also be used to estimate the errors of systematic sampling. Yates (1948) 
has already shown that, with caution, the sum of squares corresponding to the position of the 

6-2 



84 On a method of trend elimination 

observation in each block might be used to estimate the accuraoy of a systematic sample. 
For this purpose the trend elimination is obviously unnecessary. However, since the residual 
mean square will provide an upper limit for the variation between such samples it might on 
occasions be used to supplement the estimate obtained by Yates s method. 

7. Examples op the method 

(a) As a first example of the method, the cubic used by Kendall (1 947 ) to demonstrate the 
variate-difference method was used. This cubic was 

x t = (t~ 26)+-^(f-26) 2 -f T ^ Ef (t — 26) 3 -t-e 4 , 

where e t was randomly chosen between 0 and 99. For this series, Kendall gave the following 
estimates of the second moments of the variate differences: 



Estimate 

1 

1075-41 

2 

1082-02 

3 

1076-58 

4 

1047-21 

5 

1011-05 

6 

075-20 


so that randomized blocks of two should eliminate the trend adequately. Nevertheless, to 
demonstrate the method the scheme a = 2, d = 1 has been used, i.e. straight lines have been 
fitted to successive sets of three points. This has been done in Table 3. The first column gives 
the values of x is i = 1, ..., 51, while the second column gives the adjusted end-values in this 
case by the formula x[ = + « 61 ). The third column gives X t = x ti+1 + 2?; 2i+2 + x 2i+3 for the 

adjusted values. The elements of the inverse matrix now have to be calculated. In this case 
n = 25, so that these may be read directly from Table 2 and applied in the form of a moving 
average to the X { to give the values of B t shown in column 4. These have been given exactly, 
although normally four decimal places would suffice, so that the application of the automatic 
checks can be seen. The total of the second column is derived from the total of the first by 
subtracting 7fj-5, while tire total of the third is derived from this by multiplying by 4, i.e, 
by s d+1 . The total of the fourth column is theoretically derived from the total of the third by 
multiplying by s- 2<2_1 , but owing to rounding-off errors in the v this is not exactly true, and for 
this reason the value V is given in Table 2. Thus, in this case the total of column 4 is 

6545-0 x 0-1249999 = 818-1243455 

as compared with its theoretical value of 818-1250000. The difference is small, but the exact¬ 
ness of the check makes it worth noting. To complete the analysis the sum of squares of the 
adjusted x it the sum of squares for blocks B t , and the sum of squares for the position in 
blocks ]py(Xa; 2i — Sa: 2i+1 ) 2 , where x[ is taken instead of x 1 , were all calculated to give the overall 
analysis shown in Table 4. The correction for the mean, 2-5943403, should be removed from 
the blocks sum of squares if it is desired to test the effectiveness of the method, but this is 
usually unnecessary. The agreement of the mean square with Kendall’s value is very striking 
and much better than would normally be expected. The smallness of the square due to the 




M. H. Qtjenouille 


81 


Table 3 . Method of trend elimination 


Xi 

x { 

X* 

Bi 

-96 

76-5 



-90 

— 

- 120-5 

-40-36809720 

-17 

— 



-32 

— 

-92 

-5-60017485 

-11 

— 



-59 

— 

-97 

- 18-03089050 

32 

— 



28 

— 

110 

16-78546010 

22 

-- 



02 

— 

196 

27-31782615 

50 

— 



2 

— 

133 

15-30711740 

79 

— 



-7 

— 

139 

13-83899325 

74 

— 



85 

— 

259 

40-65855725 

15 

— 



-4 

— 

08 

1-20958165 

01 

— 



39 

— 

140 

20-08401820 

1 

— 



51 

— 

150 

18-28619100 

53 

— 



48 

— 

191 

20-19874130 

42 

— 



10 

— 

137 

15-52117180 

75 

— 



37 

— 

101 

17-67397240 

12 

— 



96 

— 

274 

39-43470980 

70 

— 



30 

— 

182 

19-71714810 

52 

— 



64 

— 

214 

24-26195915 

34 

— 



126 

— 

344 

48-71062335 

58 

— 



57 

— 

207 

27-47401355 

95 

— 



75 

— 

403 

53-44517700 

158 




99 


454 

54-85493515 

98 




159 

— 

572 

71-42532170 

156 

— 



180 

201 

— 

717 

88-59298800 

239 

— 

900 

114-01036025 

221 

— 



270 

— 

837-5 

127-30858030 

249 

“ 








Total 3349 


3272-5 


6545-0 


818-12434550 






86 On a method of trend elimination 


Table 4, Analysis of variance for trend elimination 



Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

blocks 

25 

448274-31 


Position in block 

1 

406-12 


Residual 

24 

25777-82 

1074-08 

Total 

50 

474458-25 



Table 5. Variate differences of e~ x in the range 0-0 (0-1) 3-0 


Difference 

Estimate x 10 10 

1 

8306033-3 

2 

25973-6 

3 

82-0 

4 

8-8 

5 

9-4 


position in block is as foreseen, but it seems advisable to remove this in all analysis as a safe¬ 
guard against an inadequate fit. It is fairly obvious an oscillatory movement taking, say, 
values ± 10 alternately would be reflected in a term of about 5000 and could thus be detected. 
(b) As a second example values of e~ x were taken to four decimal places for 0-0 (0-1) 3-0. 
Theoretically, successive variances obtained by the variate-difference method will decrease 
indefinitely, but in fact erroi's will prevent the calculation of more differences than are of 
practical importance. Thus in this case the rounding-off errors have an expected variance of 
8-3 x 10 10 , and Table 5 verifies that the variance is steady at this level after the third 
difference, so that there is no advantage in fitting schemes of degree greater than three. 
A scheme of degree tlrree was fitted in the same manner as for example (a), except that in this 
case 

*=1(^1 + 73:29), *2 = i (*2 + * 30 ), x' s = J(7*3 + * S i) 

and X* = *M+1 + 4 *2i+2 + e *2i+3 + 4 *2 i+4 + *21+5- 

The calculation of end-corrections and the form of X are not very difficult, especially for 
schemes of degree one where we have 


x > - 5. ±It 
*1 ~ 


■1)*< 




Xn — 


2x z + (s- 2)x m+ i _ 3*3 + (s - 3 ) x. 

■ X<\ — —— —-' 




etc., 


Xi ~~ Xsi+1 + 2 * si + a+ 3a: ^+3 + ’ 1 • •■ + sa W> <+(* -1) sWd i+i + -+ aw 2 ) t-i • 

ghmir+f llble o aIati0n fUTther the C0rresp0nding coefficients to? schemes of higher degree are 

The completed analysis for the trend elimination for r* is shown in Table 7. The residual 
mean square !s small and for most purposes the trend elimination would be sufficiently good, 





M. H. Quenouille 87 


Table 6. Coefficients for end-corrections and the representative equation 


s d e 

x t 

x, 

2 2 1 

i (1, 3), 1 (3, 1) 

1, 3, 3, 1 

2 3 2 

KUbilbiUP.,!) 

1, 4, 6, 4, 1 

2 4 2 

* (1, 15), A (5, 11), iV (11, 5), A (15, 1) 

1, 5, 10, 10, 5, 1 

3 2 2 

Ml, 8), i (1, 2), i (2, 1), * (8, 1) 

1, 3, 0, 7, 6, 3, 1 

4 2 2 

A (1, 15), A (3, 13), i (3, 5), * (5, 3), A (13, 3), 

1, 3, 6, 10, 12, 12, 10, 0, 


* (15, 1) 

3, 1 

5 2 2 

A (1, 24), A (3, 22), A (8, 19), i (2, 3), 

1, 3, 0, 10, 15, 18, 19, 


MS, 2), ... 

18, 15, ... 

C 2 2 

A (1, 35), * (1, 11), i (1, 5), A (5, 13), A (5, 7), 

1, 3, (i, 10, 15, 21, 25, 27, 


iV (7, 5), ... 

27, 25, ... 

8 2 2 

A (1, 63), A (3, 61), A (3, 29), A (5, 27), A (15, 49), 

1, 3, (5, 10, 15, 21, 28, 36, 


A (21, 43), A (7, 9), A (9, 7), ... 

_ 

42, 46, 48, 48, 46, ... 


For example, for the scheme s = 3, d = 2, e = 2, 

+-i), ^'2 = ^a “ fi (2.r a 4--^'a 11-,-al! 

= 1(^*4 + a: an+4)i = *3<+i + ^af+aTihiif-w + 7a' a ( + 4 + .... 


but this mean square is very much largerthanits theoretical value, and from this viewpoint the 
trend elimination has not been successful. In fact, the difficulty arises owing to the different 
slopes at either ends of ranges one being roughly twenty times the other. This can be demon¬ 
strated by analysing e~ x> or, in this case, the normal ordinate. Table 8 shows that the variate 
differences behave in the same manner as previously, while tho analysis of variance in 
Table 9 shows a reduction in the residual mean square due to the slopes at either end of the 
range being more comparable. Table 10, which gives the residuals from the two analyses, 
further verifies that the end-adjustments are largely the cause of tho residual. It must bo 
noted that the sums of squares in Table 10 do not agree with the residual sums of squares 
given in Tables 7 and 9. This is due to the rounding-off errors in v i which cause a rounding-off 
error of 0-0000001 or one in 78125 in V. Usually, of course, this will be of no importance, but 
in this case where the mean square due to trend is roughly 10 8 times the true error mean square 
it is of greater importance. The values given in Table 10 are the more accurate. 

In general, the residuals from any fitting will indicate whether the end-corrections are 
affecting the analysis. If the end-corrections are affecting the analysis then further analysis 
will usually be necessary. The observations can be fitted without end-corrections or a form of 
partial end-correction can be used, although either method will be fairly lengthy. The calcu¬ 
lation can, however, be greatly shortened using the above inverse matrices, For example, tho 
fitting of 

2b v b x + b z> 2 b t , .... 2 b n , 

involves the inversion of 

rs i o o ... o o'l 


1 

6 

1 

0 ... 

0 

0 

0 

1 

6 

I ... 

0 

0 

0 

0 

0 

0 ... 

6 

1 

0 

0 

0 

0 ... 

1 

5 






88 


On a method of trend elimination 


We may multiply this matrix by 

r 6 
1 
0 


: 


1 0 0 ... o 

6 1 0 ... 0 

1 6 1 ... 0 


0 0 0 ... 6 
0 0 0 ... 1 



® J 


Table 7. Trend elimination analysis for e~ x 


X, 


Xt 


1-0000 

0-1782 

4-0510 

0-0114487107 

0-9048 

0-4799 



0-8L47 

0-7191 

10-0459 

0-1050190925 

0-7408 

_ 



0-6703 

— 

10-6789 

0-0820532406 

0-6065 

__ 



0-6488 

— 

8-8248 

0-0688415858 

0-4966 

_ 



0-4493 

— 

7-2253 

0-0566250230 

0-4066 

_ 



0-3679 

— 

5-9169 

0-0458760583 

0-3329 

— 



0-3012 

— 

4-8433 

0-0374148344 

0-2726 

— 



0-2466 

— 

3-9651 

0-0306796883 

0-2231 

_ 



0-2019 

_ 

3-2465 

0-0251287380 

0-1827 

— 



0-1653 

-- 

2-6582 

0-0206073968 

0-1496 

— 



0-1353 

— 

2-1763 

0-0169669999 

0-1225 

— 



0-1108 

-- 

1-7820 

0-0134806760 

0-1003 

— 



i 0-0907 

— 

1-4689 

0-0119720304 

0-0821 

— 



0-0743 

-- 

1-3119 

0-0076813439 

0-0672 

— 



0-0608 

— 



0-0550 

— 



0-0498 

— 



Total 

8-5230 

68-1840 

0-5326943186 



Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Blocks 

14 

3-7858845 


Position in block 

1 

0-0000005 


Residual 

13 

0-0004302 

0-0000331 

Total 

28 

3-7863152 











Table 8. Variate differences of the normal ordinate in the range 0-0 (0-1) 3-0 


Difference 

Estimate variance x 10 10 

1 

1173518-3 

2 

5630-6 

3 

59-3 

4 

11-0 

5 

10-8 


Table 9 . Analysis of variance for trend elimination 



Degrees of 
freedom 

Sum of 
squares 

Moan 

square 

Blocks 

14 

1-1853296 


Position in block 

1 

0-0000027 


Residual 

13 

0-0000527 

0-0000041 

Total 

28 




Table 10. Residuals from the fitting of eand the normal ordinate 


c~ x 

Normal ordinate 

-0-00319 

-0-00207 

U-01403 

0-00597 

- 0-00452 

-0-00284 

— 0-00749 

- 0-00)20 

0-00412 

0-00093 

0-00292 

0-00002 

-0-00193 

-0-00044 

-0-00127 

- 0-00023 

0-00083 

0-00017 

0-00060 

0-00011 

-0-00039 

-0-00013 

- 0-00026 

-0-00004 

0-00010 

0-00002 

0-00012 

0-00004 

- 0-00002 

— 0-00003 

-0-00013 

— 0-00001 

- 0-00O06 

0-00001 

0-00016 

— 0-00003 

0-00016 

0-00005 

-0-00030 

— 0-00OOI1 

-0-00049 

- 0-00004 

0-00071 

0-00001 

0-00098 

0-00028 

-0-00151 

— 0-00037 

-0-00229 

- 0-00032 

0-00349 

- 0-00054 

0-00479 

0-00086 

-0-00932 

-0-00102 

Total -0-00010 

-0-00012 

8 um of squares 0-000447072 

Sum of squares for 

0-000057924 

position m block 0 000446009 

0-000055171 










90 On a method of trend elimination 

giving 

u 0 U-i 

u t 1 

«2 0 

u 2n-2 0 

_ u 2n-l u 2n-2 u 2n-3 - - • U 1 u 0 J 

which is more easily inverted. However, the simplest, though not the most efficient, method 
would seem to be the use of an analysis of covariance with dummy variates (Bartlett, 1937) 
to remove the effect of the end-terms. This would involve a covariance analysis on as many 
dummy variates as there are end-corrections, and will obviously be most useful for the 
simpler forms of trend, i.e. for trends of lower separation. Thus if we wish to correct a cross- 
correlation between series, a covariance must be used eliminating dummy variates repre¬ 
senting the end-corrections. 

However, the covariance analysis may be partitioned into components representing 
adjustments for the differences in the first, second, .... differentials at the beginning and at 
the end of the observations. Thus, for example, in this case if the slopes at either end of the 
series are b - c and b + c, the observations may be taken as 

a x — 36 + 3c, a 1 -26 + 2c, a^b+c, a 1: ..., n 2 , a 2 4-6 + c, fl 2 + 2b + 2e, a 2 + 36 + 3c, 
and the adjusted observations are 

^(a x -l-7a a +46-1- 10c), ^(4a 1 + 4a 2 +16c), 3 8 (7a 1 +a 2 -46 +10c), a x . 

Thus a covariance analysis on 6, 8, 5, 0, 0, ... 

will remove the portion on the residual variance which is due to the difference in slope at 
either end of the range. Similarly, a covariance on 

1 , 0 , - 1 , 0 , 0 , ... 

will remove the portion which is due to the difference in second differentials at either end of the 
range, etc. The great advantage of this device is that when we are dealing with forms with a 
large number of end-corrections, instead of carrying out a covariance analysis on this 
number of dummy variates, if necessary it will usually be sufficient to carry out one, or 
possibly two, analyses on 

1x9, 2x8, 3x7, 4x6, 5x5, 6x4, 7x3, 8x2, 9x1, 0, 0, ..., 

i-e. 9, 16, 21, 24, 25, 24, 21 , 16, 9, 0, 0, ... 

should suffice. 

In the above examples joint covariances on the three adjusted end-observations remove 
0-000446223 in the e~ x analysis and 0-000055030 in the normal ordinate analysis. Thus within 
the limits of accuracy imposed by the rounding-off errors in v it these covariance analyses 
account for the residual variances given in Table 10. These further analyses lead to corrections 
in the adjusted values of-0-0218, -0-0423and -0-0172 fore- 1 and -0-0008, -0-0102and 
— 0-0003 for the normal ordinate. In the latter case a covariance for the second adjusted 
observation eliminates nearly all (0-000054972) of the residual variance. If, however, co- 


iij ... ^2,1^2 ^2)1—1 

0 ... 0 ^2n-2 

1 ... 0 u 2n~2 




91 


M, H. Qumouille 

variance analyses are carried out to remove the portions due to the differences in slope, these 
are found to account for 0*000406947 and 0-000027532, which are the major portions of the 

residual variation. 

8. Summary 

It las keen suggested that trend might be eliminated using a series of consecutive poly¬ 
nomials of the same degree. A method has been given whereby the calculation can be rapidly 
carried out provided that the observations are circularly related. Ways of inducing circu¬ 
larity have been given assuming that the differential coefficients at either end of the range of 
observations are equal, and it has been shown how any deviations from this assumption can 
be adjusted using a covariance analysis. 


REFERENCES 

Bartlett, M. S. (1937]. Some examples of statistical methods of research in agriculture and applied 
biology, JT. Statist. Soc . Suppl. 4,137-83. 

Fisher, R. A. (1924), The influence of rainfall on the yield of wheat at Kothanistod. Phik Trans. B, 
213,89-1,42. 

Kendall, M, G. (1947), The Advanced Theory oj Statistics, 2, Griffin and Co, 

Yates, F, (1948), Systematic sampling, Philos. Trans , A, 241,345-77, 



[ 92 J 


ON THE ESTIMATION OF DISPERSION BY LINEAR 
SYSTEMATIC STATISTICS 

By H. J. GODWIN, University College of Swansea 
1. Introduction 

The purpose of this paper is to discuss the efficiency of estimates of the standard deviation of 
a population which are obtained by ranking the observations of a sample and taking a linear 
combination of them. Such statistics are termed systematic by Mosteller (1946). Wo shall 
consider only the case in which the same rule of combination is used for every sample of a 
given size; thus the mean deviation from the mean, which takes different forms according to 
the position of the sample mean relative to the observations, will be excluded from the 
following theory. 

The best-known statistic satisfying the requirements is the range whose probability in¬ 
tegral in samples from a normal population was tabulated by Pearson (1942) and Hartley 
(1942). Earlier, Tippett (1925) had tabulated its mean value for sample sizes 2-1000, and 
given formulae for higher moments, while E. S. Pearson (1926) had calculated second, third 
and fourth moments for several sample sizes. Other measures which are the difference of two 
observations have been proposed, such as the interquartile range, discussed by Hojo (1931), 
and the difference of quindeciles, suggested by K. Pearson (1920). Mosteller (1946) discusses 
these and other differences of symmetrically placed ranks which he calls quasi-ranges. 
Recently, Nair (1947) has considered the mean deviation from the median. He propounds 
several questions as to the usefulness of this statistic, and the present paper has grown from 
an attempt to answer these. I had obtained the frequency function of this statistic in normal 
samples (as Nair (1948) observed, after he had obtained it as a special case of a more 
general result), but finding it intractable for further development did not attempt to 
publish it. 

The method used here is to consider the first and second moments of differences of con¬ 
secutive ranks. These were studied by Irwin (1925) and Pearson & Pearson (1931), the last 
authors deriving results about the differences from a knowledge of the moments and correla¬ 
tions of the ranks themselves. Recently, Hastings, Mosteller, Tukey & Winsor (1947) have 
also tabulated the means, variances and covariances of ranks, and either of their results or 
my results on rank differences given below is deducible from the other. However, my results 
are computed to more places than theirs, and I have thus been able to give a more accurate 
version of their table (Godwin (1949)). Some of the results which they computed have 
been given exactly by Jones (1948), who has evaluated certain integrals connected with the 
normal distribution. I have also extended this table in the paper referred to above. 

2. PlRST AND SECOND MOMENTS Off THE DIFFERENCES BETWEEN RANKS 

Let the population studied have a distribution function F[x), with a frequency function 
f{x) = (djdx) F{x). We require the limits 

lim xF(x) and lim *(1- F{x)) 

*-»--« x->ca 



H. J. Godwin 


93 


to exist and to be equal to zero. This is so if the second moment of/(a;) is finite, hut this is not 
a necessary condition, as illustrated by the case 


/(*) = 


J ±A 

2(1 +1 a: |) 2+A 


(0<A«SI). 


Let x x , x 2 ,.... x n be a sample of n from this population, such that aq x 2 ^ .., < x n . 

Let Vi ~ (1 = 1,2,n—1). 

Any linear function of the *’s can be expressed as a linear function of the y 's added to a 
muftipie of x v If the function is to measure dispersion vro require it to bo aero when all the x'e 
are equal and all the y’s thus zero. Hence the term in x 1 is zero and the statistic is a function 
of the y 's only. Thus to find its first and second moments we need K(y £ ) and E{y i y j ) for all 
i, j, E denoting expectation. 

If in the statistic the coefficients of y t , y n _ t are the same (i = 1, 2,..., [in]), we shall call it 
symmetrical. Its value is then unaltered if the sample (aq, a; 2l ..., x n ) is replaced by 

(h — x n , k x n _ii aq), 

where k is any number. 

It has been shown by Irwin (1925) that 



F l (x) [1 — F(x)] n ~ i dx, 


an expression which, in a sample of n, Irwin denotes by y„ If the distribution is symmetrical, 
E{y i ) = E(y n _ { ) = + E{y n -i)), while if the statistic is symmetrical y i and y n _ t always 

occur added together. In both these cases the quantities wo need can be expressed in terms 
of the integrals 


f{i) = 


F i {x)[l-F{x)] i dx. 

— 00 


( 1 ) 


To obtain the relation we must express F n ~ 2i + (1 - F) n ~ 2i in powers of F(] - F). Put 
2F-l = u, F(l — F) = v - }(l — u 2 ). 

Then F r + {1-F) r = (|(I + w)) r + (a(l-«)) r 


fir] 


= 2 ‘- f 2 


s —0 


' 2 # 1 


,2s 


= 2 1 - 1 - 2 r <4( 1 - -I y) a . 


The coefficient of v 3 in this is (— i)» 21 -r+a* 2 r C, u l C s . 

|a;Jj 

The sum is the coefficient of ar 2s in 

(1 + a; -1 ) r (1 - a; 2 )-s-i coefficient of x r ~ 2s in (1 + *)r-s-i (l _ a:)~ s ” 1 

= S* r -‘- 1 C l ’-*- i C s = 2,{ r -«C/- HJ t - r - s - 1 C; 

= 2 r_as_1 {2 r ~ s C s - r ~ s_1 (? s }. 


Therefore 


E(Vi) = J n c\ 2 fl (- 1 ) s + i) {2 ~ l c 3 ). 


( 2 ) 


Various identities can be obtained by means of this expression; for example, the mean 
ranges for odd sample sizes can be found from the mean ranges for even sample sizes. We can 



94 On the estimation of dispersion by linear systematic statistics 

also obtain the identity, suggested by ‘Student’ and proved by E. S. Pearson (1926), con¬ 
necting E(iji) with mean ranges, viz. 

This is not, however, useful for computation, owing to the large number of additions and 
subtractions involved. Starting, as Pearson did, with the mean range to five places of deci¬ 
mals, we obtain E{yf\ to only two places when n = 10. 

Irwin also showed that 


E(yt) = 2*C i 


I* 00 f* 00 

: **(*) 

J - CO J X 


[1 - F(y)] n ~ i dydx 


We put 


\jr(a,b)=( F a (x) f [1 - F{y)] b dydx) 

J —CD J X 


(3) 

(4) 


for a frequency function symmetrical about the origin, \jr(a, b) — ijr(b, a). 

To find E(y { y } ) we must first find the joint frequency function of y it y } . 

Hj >i + l this is 

- lT rw ^ ++ 

X F*- 1 ^) [F(xf - F(x t + 2 /i)]^^ 2 [1 - F{Xj + y,-)]”-’- 1 dx } dx { . (5) 
When j = i+l wc put Xj = x t + y it and omit the terras/^.) [F(Xj) — F(x t + i/. i )] ) ’ _£_a . Although 
this case needs special consideration initially, the final formula covers all cases. We now 
multiply (5) by yj/j and integrate over these variables from 0 to oo. If we integrate by parts 
with respect to y p put y i -u- x s and integrate by parts with respect to x p we get 

(i~ l)!(j-i - l)! (w—j)l J_«>la-(+ w Jo yMf{Xi+yi) 

x i F ( x j) ~ F ( x i + [1 - dy t dx } dx ti 

which is the same expression as arises in the case j = t + 1. Proceeding further we put 
X < + Vi- w, integrate by parts with respect to x p change the order of integration of u and x } 
and integrate by parts with respect to u, getting 
n\ f co roo 

if(jIT)!(n-j)!J - m F ( ' Xi \ J X( ~ F ( x M 1 ~ i [ 1 dx^. 

On expanding [F{ xf) - F(x t )y^ => [1 - (1 - F{x } )) - 

by the multinomial theorem, we have 

Bfj/tVi) = n C^~1+% % } T( ~ l) r+s ’~‘O r ^-'C s f(i + r,n -j + s ). (6) 

r=0 s=0 

The mean and variance of any function £ a i y i can now be found from (2), (3) and (6). 


3 . The most efficient linear systematic statistic 
We now show that it is possible to find a { such that 

»-i 

d = £ a-iVi 
i=l 

is more efficient than any other linear statistic for sample size n. The efficiency of d is in¬ 
versely proportional to (var d)l(E(d))\ which we denote by W{d). Writing for brevity 

E ( Vi ) = e i> EiViVi ) = e a > 




H. J. Godwin 


95 


we require (SSe y a i a 3 )/(Se i a i ) 2 to be a minimum. It is convenient to minimize 

B = log (SEe^-aqoq) - 2 log (Se^). 

We have ^ = H'Ze ij a j )l(ZY,e ij a i a j )-2e i l(Ze i a i ). (7) 

The (» -1) equations 312/ da t = 0 may be written 

Ee y oq = Ac*, 

where ^ (SEe^flqoq)/(X6fOq). 

The determinant of coefficients on the left-hand side is the discriminant of the quadratic form 
TSflycqcq, and is non-zero since the form is positive definite, being E^La^j^f. Hence the a’s 
may he solved for in terms of A, and since only their ratios are required, this is sufficient. It is 
found that if the population is symmetrical, then so is the statistic. 

It remains to show that these values give a minimum value of 11 ; 


8 2 i? _ 2% 2(Le ij ct j )('Z& it a t ) 2e^e J . 

dcc{ da } ESe^oqoq (SSe^aqa.,.) 2 (Ee.-oq) 2 


% e u 

SS e^-cqa/ 


from (7). 


Hence, by the argument used above, EE 


d*M 

8 cti da,] 


So,] 8aj is a positive definite quadratic form 


and B has a minimum. 

The most efficient linear statistic can thus be determined, given the population. The 
converse problem, of finding the populations for which a given statistic is most efficient, seems 
more difficult. A unique solution is not to be expected, since the range is best for both a 
rectangular population and the binomial population with frequencies 1, ,1. It seems likely 
that the ratios of the a’s can only lie in certain ranges of values; on intuitive grounds one 
would expect a 2 /a 1 , a 3 /a v etc., to be large if the distribution is leptokurtic, so that less 
reliance can be placed on the tails, and this is confirmed by a few special distributions for 
which I have worked out the best estimate for a sample of four. .However, it does not seem 
that the correlation between a 2 /a x and /? 2 is exact. 


4. Rectangular population 

For this population all the integrals which occur can be evaluated explicitly. We have 
tii) = 2(i!) 2 /(2i+l)!, = 4(i!)(j!)/(«+j + 2)! 

(taking/(m) = - 1 sj a; ^ 1), whence, in a sample of n, 

*(&)-2/(»+l), TO) = 8/(» + l)(» + 2), E{y i ij]) = 4/(«+l)(» + 2), 

The most efficient linear statistic is the range, and 

~W{w) — 2/(?i —1) (ft + 2). 

The mean deviation from the median (m') is defined by 

2vm' = (y 1 + 2/ 2p _ 1 ) + 2 (y 2 + y 2 „_ 2 ) + ... + vy v if n - 2v, 
and by [2v+\)m' = {y 1 +y iv ) + 2{y 2 +y Zv ^) +... + v{y„ + y v+x ) ifn=2p+l. 

This gives E(m') = vj(2v + 1) in samples of 2p and 2v+ 1, while 

E(m' 2 ) = (3p 3 + 2v 2 4.1)/3p(2v+1)(2v + 2) in a sample of 2v 



96 On the estimation of dispersion by linear systematic statistics 

and tf(m' 2 ) = 2 r( 3 v 2 + 5p+l)/3(2r+l) 2 (2r + 3) in a sample of 2v+ 1. 

Hence W(m') = (v 2 + v+ l)/6r 3 in a sample of 2v 

and W(m') = (i'+2)/3i'(2v+3) in a sample of 2v+ 1. 

Hence the efficiency of the mean deviation from the median relative to that of the range 
decreases to zero as the sample size increases to infinity. 

The equality of mean values of m! in samples of 2v and 2v +1 which occurs above is not a 
coincidence, but is true for all populations. The mean value of m' in a sample of 2v is 

A J°° ||^ r iP O r (F r (l - F)*'- r + F tv - r {l - Ffij + v^F^l - Ff j dx. 

Multiplication of the integrand by F + (1 - F) gives 

— f M li l {F r (l~F\ 2 "- r + l + F 2 ‘ l -^ 1 (l-Fy){r i ‘’O r +{r~ l) 2l 'C' r _ 1 )]da: 

2vJ _ool i 1 

= — L_ f °° (£ r 2--+1 C r (F r ( 1 - F) i '’~ r + 1 + J 12l ' _r+1 ( 1 - F) r )\dx, 

2u+ 1 J -mil 1 

which is the mean value in a sample of 2v +1. 

6. Normal population 

With this population most of the integrals required cannot be evaluated, as far as one can 
tell, in terms of known functions, and some account will first be given of the calculations 
performed to give the results which follow. 

To evaluate the \Jr(i) (i = 1,5), 

was obtained from British Association Tables (1931) and F i (x) (1 - F(x)Y tabulated for x at 
intervals of 0-1, These were integrated from 0 to the value beyond which the integrand was 
less than 10~ 10 by Simpson’s rule, and checked by the ‘three-eighths’ rule. 

On integrating by parts we have 

#b!) jcF a (x)[\-F{x)]dx. 

The integrals f x[2F(x) - 1] F r {x) [1 - F(x)] r dx (r - 1 ,..., 4) 

J —CO 

were computed as above and the required integrals obtained by algebraic combination. For 

/’co 

vHh j) ( i,j >2), l (1 - F(y)Y dy was computed, using Cotes’s formulae up to that founded on 

J ® 1 

an octic curve where necessary, and these functions were multiplied by F i (x) and integrated 
again by Simpson’s rule. To check the working, use was made of the identity 

f(i) tU) = 2 S S {-1)“+® + u, j + v) 

1l»0 D = 0 

which is obtained by writing fr(j) as 



H. J. Godwin 


97 


expanding (1 - F(x))\ (1 - (1 - F{y))) j by the binomial theorem and changing the order of in¬ 
tegration where necessary. This involved every value except fr(l,9),fr(3, 7) and f{5, 5), and, 
allowing for the accumulation of rounding-off errors in the use of the identity, agreement was 
such as to give confidence in the last figure being not more than one or two units out, and that 
only for the larger values of i, j. 

A further check was attempted by calculating from these integrals some other integrals 
calculated by Hojo (1931), one set being expressible as linear combinations of the others. (In 
his notation the integrals involved are 7 a ,..., T 7 , R t , ..., Jd 6 , 2 /i,..., a I 6 , -J 2 and Jf] . This did not, 
however, help very much, as his values are to eight decimal places only; also they differ from 
mine in the eighth place by one or two units in several cases, and where direct evaluation is 
possible my value proves to be the correct one. 

The values of the integrals ijr(i) and i/r(i,j) are given in Table 1; E(y { ) and EiPiVi) for sample 
sizes 2 to 10, calculated from formulae (2), (3) and (6), are given in Table 2; and the coeffi¬ 
cients for d, the most efficient unbiased linear estimate of standard deviation, are given in 
Table 3. Values which are algebraically identical (e.g. E[y t ) and E{y n _ i )) are given once only, 
In Table 4 are given the efficiencies of a number of statistics relative to that of the maximum 
likelihood estimate. Tor the latter, I have taken the unbiased estimate derived from the sum 
of squares about the sample mean, viz. 


r (i(w- i)) 

•n/2I» 


VtZ(a-S)*]. 


Thus, for example, the efficiency of d is 


Jdl 

void 



r(i(n-l)) 

f(V) 


■1- 


Standard errors of the maximum likelihood estimate, the range estimate and the mean 
deviation estimate, denoted by cr fi , cr ea and er et respectively, were compared by Davies & 
Pearson (1934, see top row of their Table I). 


Table 1. Integrals ifr(i) and of power products of normal tail areas 

{equations (1) and (4)) 


i 

1 

2 

3 

4 

r> 

tv 

0-56418 95835 

0-09900 37941 

0-02015 40834 

0*00435 75543 

0-09097 35536 

j 






( tccc-a<r.cnh«Mf- 1 

0-50000 00000 
0-19550 11094 
0-11216 77761 
0-07565 42297 
0-05580 73500 
0-04357 46098 
0-03538 45805 
0-02956 9889.3 
0-02525 68141 

--J 

0-05015 71621 
0-02132 02754 
0-01138 07208 
0-00693 43673 
0-00400 22187 
0-00324 55847 
0-00239 45873 

0-00720 90995 
0-00319 21562 
0-00165 92449 
0-00095 98747 
0-00059 95945 

0-00120 76002 
0-00054 77852 
0-00028 10310 

. 

0-09022 04352 


Biametrika 36 


7 




Table 2. Moments and product moments of rank differences in normal samples 


onciCTH^HOMOcOtMcD^{NiC©HHO^C£^iCl-cOHCOcocC 

h lO O W ifj M? i£3 M CO h N P) h 05 rt CO CO CO O CO rH iJ5 I> CD l Q CO CO CC f— i 
«3S.O^C^^^^L-^CQ«{0^»CO^COOOOOOHCD®®t;l5CO 
lOCO<N010q*0<Ni-H.—l.--<r-Hr-HrHr-HrHr-<i-H<NO©OOOi-^< O O O O O O 

6606060666 OOOOO 6 OOOOOOOOOOOOOO 


C*1 


CO -cH CO 
«, CO -rf *Q 

n c r- 
10 zo a r- 
wniNN 
6 6 6 6 


X «D TjH 

(M <0 (N O 

N « h tf) 
C*7 CO CD CO 
lO CM 


M P5 Tf ffi O M H 

O h O ^ 50 © 

i» to O CC H ® (N 

CO ift ‘’t ^ ® O 

r«-( *-"1 r—< r—H r—( i-H CO 


X r’ IN CO CD 
D5 1—I -rj* CO CQ 

05 05 05 O <M 

O O O r-’ r-H 


6666 6666600 0600 


0^0 

Tt co 

50 *Q 

l> l> GO 

OOO 

6 6 6 


if? 

CM 

CO 

05 

CO 

-* 

10 



1-* 

iO 

r-^ 


ID 

cc 

CO 

1 ( 

(X) 


CM 


O 

0 

CM 

If? 

CM 

CO 

CO 

—1 

CM 


CO 


00 

-tH 

CO 

(M 

CM 

00 

co 

CO 


CO 

O 

I-H 


00 0 

C5 


10 

1— 

CM 

05 

0 

O 

(M 

CO 

0 

10 

1 H 


0 

iD 

CD 

t— 

10 

05 

O 


oc 

X 

co 

oq 

cO 

CD 

t— 

0 


! — 

L- 

P- 

CM O 

«D 

to 

00 

CD 

O 

l' 

CO 

1 — 

1—H 

<M 

F—( 

1—1 

F»H 

*+* 

cn 

cr. 

ID 

CO 

CO 

CO 

if? 

CM 

r—i 

F—< 

oq 

f-H 

f-H 

f—H 

<M 

CO 

1 —' 

rH 

t-H 

’■y 1 

O 

O 

6 

0 

6 

6 

6 

O O 

6 

6 

6 

0 

l—p. 

° 

6 

O O 

0 

w 

w 

6 



0 


*0 

T* 

05 


l— 

00 

t- 

CM 

OO 



05 

I'®' 


CM 



I'' 

1— 



CO 

0 


>—1 

CD 

co 

»o 

to 


O 

OO 

6 


O 


0 

50 

0 



rjc 

IQ 


CO CD 

** 

CD 

Tt< 


CM 


1— 


1- 


00 

CD 

ir- 


0 

(M 

Ttc 


10 

CD 

05 

CO 

OO 


CO 

>0 





T* 

Tb 

CM 


0 


CO 


<M 

05 CM 

iO 

0O 


CM 

1.0 

oc 


If? 


05 

c? 

IQ 


1—( 

05 

(M 


CM 

05 

O 

CO 



CO 

CO 

10 


—H 


IQ 


CO 


CO 

CM 

CM 


<M 

(—H 

<M 

CM 

CO 


1 —< 

1—1 

f“H 


F—< 


6 

6 

6 


6 

6 

6 


6 

6 

6 

6 

6 


6 

6 

6 


6 



M (N o 
H 00 CO 
»o o o> 
cm o 

»D O CO 
N O 
CO TH 
6 6 6 


N CD O 
COHt}I 


CD QO 
CD CO cm 


CO ^ 05 O 
-+( 05 iD O 
O Ci CO 1— 
OO l"» CD «D 
CO 05 CD CO 
10 CO CD CO 
CM CM (N CO 
6 6 6 6 


CM CO 
QO 05 
CO CO 
50 00 


O r- 

O F*—C 

CM L- CM 

■^1 

IO 05 

CO 10 

^ CD CM 

co 

u? 00 

CM ID 

'OX r- 

00 


^c CM 

^ <M CO 

co 

05 0 

ID ID 

r^. id >D 


r^- 10 

CO co 

co xtc CO 

IQ 

co 05 

IQ CM 

Onrt 

Ol 

CO 

t*~ -tH 

co co Tb 

CM 

6 0 

6 6 

0 oq co 

6 


P- 0> 
g .5 
§ w 

w 



1 —1 IQ 

05 ID 

■cji co 


05 50 

O ^ 

IQ co 


05 T> 

CM 05 

O CNJ 


CO CM 

CM ID 

'CfC »Q 

yji 

CO CM 

Ttt OO 



CO O' 

co 00 

f —1 r-» 


CM t* 

f"* -H 

SC —1 


CO 05 

05 O 

05 —1 


l> IQ 

OO 50 

CO >D 


6 6 

6 6 

6 6 


>D 

CO 

t> 


t- 

ID 

so 


CO 

cffi 

CO 



CO 

CO 

FS-S 

OO 

O 

00 


(M 

O 

05 


CD 

CO 

CO 


Tfc 

t> 

»Q 


00 

F«H 

CO 


6 

>—1 

6 


0 

0 



L^* 

0 



CO 

0 



r“4 

0 



05 

0 


CM 

I-" 

0 



CO 

0 



00 

0 



CM 

0 



fH 

0 



F—1 

CM 



-£* t~ co *jF 10 «o r- irj <e 

<r- ^ ^ inWriWfiN M<ji *tc>in ri -I, ^ ^ TI ‘^ "v, 'd ^ ^ ^ "^5 ^5 ->35 

fe| tt) fe| K) fcj Eg; *! l ^C 1 pCqCi5E^K)C 1 5^6q|« ) t ? [i 3 C£|jvj K]e!;5E q Kl ^ Ce! g' c ^ !;t:) 



H. J. Godwin 9y 

As far as extrapolation is possible from such small sample sizes, it seems that the range is 
the best statistic in current use only for sample sizes up to 6 (offsetting ease of computation 
against a slight loss of efficiency for n = 6). Further, that the mean deviation from the median 

Table 3. Coefficients, a it in the most efficient unbiased linear estimate of standard deviation 

in samples from a normal population 


Coeftici- 






ent of 

Vi 

I/Z 

!h 


Ik 

Sample 






size, n 






2 

0-88622 69255 





3 

0-50081 79502 





4 

0-45394 0395 

0-56412 1139 




5 

0-37238 157 

0-50759 551 




o 

0-31752 48 

0-45608 45 

0-49929 61 



7 

0-27781 06 

0-41290 78 

0-47537 14 



8 

0-24758 6 

0-37703 4 

0-44834 3 

0-471300 


9 

0-22373 

0-34700 

0-42210 

0-45807 


10 

0-20438 

0-32158 

0-39784 

0-44142 

0-45564 


n-i 

Note. The estimate is cl— D aiijt, when! a,- = a„_ ( . 
i—1 


Table 4. Percentage efficiencies of various estimates of cr in a normal 'population 


\ Statis- 
\. tic 
Samplev 
size 'v 

d 

m 

?u f 



•^n-2 

' 

‘Cl—3 - -C 

2* 

100-00 

100-00 

100-00 

100-00 




3* 

99-19 

99-19 

99-19 

99-19 




4 

98-92 

96-39 

91-25 

97-52 

25-24 



5 

98-84 

94-60 

93-84 

95-48 

39-97 



6 

98-83 

93-39 

90-25 

93-30 

49-55 

13-49 


7 

98-86 

92-54 

91-78 

9112 

50-14 

23-88 


8 

98-90 

91-90 

89-70 

89-00 

60-85 

32-05 

9-03 

9 

98-9 

91-4 

90-7 

88-9 

04-3 

38-0 

16-7 

10 

00-0 

01-0 

89-4 

85-0 

66-8 

43-8 

23-3 


Notation: d donotes best linear estimate; m denotes mean deviation from mean; 
to' denotes moan deviation from median; w denotes ran^o. 


* tor n = 2 the maximum likelihood estimate and all the linear ones are multiples of // x . for n — 3 all 
the linoar estimates have similar distributions. Consequently in these two oases the ollioioncics are 
identical. 


is less efficient than the mean deviation from the mean, but that the ratio of the efficiencies is 
not less than 0-945 (whicli occurs when n = 4). The figures for the efficiency of d show that it is 
possible to choose from among linear estimates a statistic much more efficient than any 
now m use. The possibilities of quasi-ranges remain uncertain, and they might repay 


7-3 




On the estimation of dispersion by linear systematic statistics 

investigation for larger sample sizes. If the above methods are used, however, it will be 
necessary to compute and j) to more places, as the chief loss of accuracy is from 
the large number of additions and subtractions of multiples of these that are involved. 

I am grateful to Prof. Pearson for suggesting a number of improvements to the original 
draft of this paper. 


REFERENCES 

British Association (1931). Mathematical Tables, 1. 

Davies, 0. L. & Pearson, E. S. (1934). J.R. Statist. Soc. Suppl. 1, 76-93. 

Godwin, H. J. (1946). Ann. Math. Statist, (at press). 

Hartley, H. 0. (1942). Biometrika, 32, 309-10. 

Hastings, C„ Hosteller, F., Tukey, J. W. & Winsor, C. P. (1947). Ann. Math. Statist. 18, 413-26 
Hojo, T. (1931). Biometrika, 23, 315-60. 

Irwin, J. 0. (1925). Biometrika, 17, 100-28. 

Jones, H. L. (1948). Ann. Math. Statist. 19 , 270-3. 

Mosteller, F. (1946). Am. Math. Statist. 17 , 377-408. 

Natr, K. R. (1947). Biometrika, 34, 360-2. 

Natr, K. R. (1948). Biometrika, 35, 118-44. 

Pearson, E. S. (1926). Biometrika, 18 , 173-94. 

Pearson, E. S. (1942). Biometrika, 32, 301-8. 

Pearson, K. (1920). Biometrika, 13 , 113-32. 

Pearson, K. & Pearson, M. V. (1931), Biometrika, 23, 364-97. 

Tippett, L. H. C. (1925), Biometrika, 17 , 364-87. 



[ 101 ] 


ON THE RECONCILIATION OF THEORIES OF PROBABILITY 

By M. G. KENDALL 
Introduction 

1 Few branches of scientific method have been subject to so much difference of opinion 
as the theory of probability. Even when we put aside numerous points of taste in presentation 
or axiomatization there remains a stubborn residual variance of viewpoint between different 
authorities. Everyone agrees that this is undesirable; nobody yet, I think, has dared to 
maintain that it is avoidable. But that is the theme of the present article. I wish to show that 
there is nothing necessarily incompatible in the varying views which are currently held; 
that the authorities are either saying the same thing in different ways or can only disagree 
because of avoidable latent differences in their premisses or their field of discussion. History, 
both past and present, teaches us that the role of mediator between contestants is neither safe 
nor profitable, and I am quite prepared to incur general disfavour for maintaining that most 
authors have some of the right but that none has a monopoly of it. In plunging into a con¬ 
troversial subject one must run the risk of controversy. I can only say that I have tried not 
to excite it.* 

2. It is convenient to consider the subject under three heads, Foundations, Direct Theory 
and Inverse Theory. The greatest differences of opinion (or, at least., those which have gener¬ 
ated the most heat) have arisen in the third, but there are substantial differences concerning 
the first. Even the second, as I shall point out, is not free from problems of a character 
similar to those arising in the other two, though the fact is not normally given much pro¬ 
minence and perhaps has not been generally appreciated. 

Frequency and non-frequency theories 

3. Although there are many shades of opinion about the appropriate foundations of a 
theory of probability we may broadly distinguish two main attitudes. One takes probability 
as ‘a degree of rational belief’, or some similar idea, and in enunciating the foundations of 
the subject does not attempt to analyse it into simpler ideas; it is therefore necessary to 
agree upon certain axioms and postulates concerning probability itself before a definite 
theory can be founded. The second defines probability in terms of frequencies of occurrence 
of events, or by relative proportions in ‘populations’ or ‘collectives’; and attempts to base 
the theory on familiar concepts without invoking a special indefinable ‘probability’. In 
practice both approaches seem to lead to the same kind of direct theory, and it would be 
rare for exponents of the two to reach different mathematical conclusions from tho same 
premisses. But this pragmatic reconciliation, though perhaps comforting to the onlooker, 
is not enough. The difference of viewpoint runs down to the heart of the subject and must be 
resolved if possible, not merely for intellectual comfort, but because it affects the theory of 
inference in which different authorities sometimes do reach different conclusions. 

wlLsfi! ^ person ® i Pronoun occurs more frequently in this article than good tosto in objective 
WT i ti ng would normally permit. Tho reason is that I wish to leave no doubt about which state¬ 
ments are advanced as personal views. 



102 Reconciliation of theories of 'probability 

4. The first point to establish is that the current methods of axiomatizing the calculus 
of probabilities are not sufficient to axiomatize a theory of probability. The pure mathematics 
of a calculus of probabilities proceeds from certain basic rules (such as those governing the 
addition or multiplication of probabilities of independent events) which are usually laid down 
without much discussion of their rationale. The simplest approach is exemplified by the 
procedure adopted in most text-books of algebra, wherein the probability of success of an 
event which can happen favourably in m out of n possible ways is defined as the ratio w/n. 
A more sophisticated approach is used by Kolmogoroff (1033) and Crambr (1945), for 
example, by relating probability to the measure of a set of points. This is probably adequate 
for the mathematician, who is more concerned with working out the logical consequences of 
his assumptions than with relating this calculus to the physical world. It fails, however, to 
solve the problem with which we are here concerned, namely, to found a theory of probability 
as a branch of scientific method. In other branches of science it may be enough to set up a 
mathematical model and to accept it if it gives a reasonably good account of observation; 
but in the most general sense we are here considering howgooda 1 reasonably good ’ agreement 
must be. 

5. It has sometimes been suggested that the domain in which the statistician is interested 
is only part of the whole domain covered by the phrase ! uncertain inference ’; that, for ex¬ 
ample, other people may reasonably be concerned with gauging the degree of doubt in 
propositions concerning unrepeatable events such as ‘Homer was blind’, whereas the 
statistician operates with series of similar propositions or events because his science is the 
study of aggregates. The suggestion is, I think, that the statistician can avoid some of the 
difficulties of uncertain inference by confining himself to classes of propositions concerned 
with experimentally repeatable events. It might even be questioned whether ‘probability’ 
as the statistician uses the word has the same connotation as when it is used by logicians or 
scientists in relation to individual hypotheses. Perhaps the psychologist can answer this 
question for us. My own opinion is that there is no essential difference (other than that of 
intensity) between the attitudes of doubt towards any propositions of whose truth we are 
uncertain. Something may be said for distinguishing those hypotheses which are capable of 
experimental verification from those which are not; it might even be (though I do not believe 
it) that different kinds of uncertain inference are appropriate to the two cases; but it does not 
appear to me that the statistician solves any essential problems by confining himself 
to special classes of case. In practice he is concerned, Hire any other scientist, with the 
formation of opinions and the taking of decisions on incomplete evidence, and the mere fact 
that some of his data can be counted or measured does not, I maintain, relieve him of any 
major problem arising in the justification of his inferential processes. 

6. I assert that any theory of 'probability which does not take probability itself as a primitive 
idea must, in some form or other, introduce an equivalent primitive before it can be ap>plied. 

The rather naive approach of the algebraic text-book, for example, may take one of two 
main forms: it can state that if, of a set of n mutually exclusive propositions, m are favourable, 
the probability that a favourable proposition is true is m/?i; or it can state that if m out of n 
equally probable and mutually exclusive propositions are favourable and one must be true, 
the probability of a favourable proposition is mjn. The first form is untrue (a man can be 
unidextrous in two ways, as right- or left-handed, but the probability is not 1 /2 that a man is 
right-handed in any theory that I know); the second begs the question by the use of the 
expression equally probable’. There are, of course, variants on the way in which this idea 



M. G-. Kendall 


103 


is put' we may speak of events instead of propositions, or refer to events happening 1 at 
random’. But they all come to much the same thing so far as concerns the point now under 
discussion. The concept of probability contains more than the mere idea of proportionality 
of cases. And, of course, the same argument applies when proportionality is replaced by the 
measure of a set or some more refined mathematical concept. 

7 The mathematician may fairly contend that such matters are not his concern, any more 
than the Euclidean geometer is concerned with the question whether there exist in practice 
objects which can be regarded as straight lines. One must be a little careful about arguing 
from the analogy of other branches of applied mathematics because at this stage wo arc 
concerned not only with the relationship of theory with the external world but also with the 
relationship between our calculus and the way we think. Even if we pass over such a point, 
however, and admit the mathematician’s right to deny interest, qua mathematician, we must 
accept responsibility for resolving the difficulty in some other capacity, as psychologists, 
logicians, or statisticians, unless we are prepared to relegate probability to the domain of 
pure mathematics and the possibility of its application to the phrontisteries of our academies. 

8. Von Mises (1928) has made a valiant attempt to provide a frequency theory of pro¬ 
bability, and for a time I was almost convinced that he had succeeded. Myself when young 
did eagerly frequent. But his theory fails, in my view, in much the same way as the other 
theories mentioned in.§ 6. He has, in fact, to introduce the idea of an ‘Irregular Kollektiv ’ 
(English writers would call it a random series) or a ‘ Prinzip des ausgeschlossenen kSpiels- 
systems’—the impossibility of a winning system in games of chance. The ‘Irregular Kol¬ 
lektiv ’ itself is a new concept outside ordinary mathematics. Attempts by various writers to 
show that there exist sequences of numbers with the necessary properties break down, I hold, 
on the point that such sequences can only be shown to obey an enumerable set of conditions, 
whereas the ‘ Kollektiv ’ must obey an innumerable set (unless the definition is to be modified, 
in which case we arrive at the repugnant conclusion that the laws of probability may be 
obeyed sometimes, but not always and not even, relatively speaking, frequently). 

9. I myself do not believe that anyone will ever succeed in producing a theory of pro¬ 
bability which does not, at some point, require a primitive idea equivalent to that of pro¬ 
bability itself. It is necessary, I hold, to have some concept of randomness, haphazardness 
or uncertainty of happening inherent in the subject which either must he introduced ex¬ 
plicitly into the axiomatization or, if omitted, must be introduced later before the theory is 
capable of application. Precisely where the idea is introduced is to some extent a matter of 
taste or didactic convenience. But appear it must. There can, I assert, be no pure frequency 
theory of probability any more than (to anticipate a later argument) there can bo a useful 
objective theory of probability without reliance at some point on empirical justification in 
terms of frequency. 

10. One point, however, is worth examination at this stage. Must we take as our primitive 
a general idea of probability which permits of the comparison of any two propositions, or is 
it sufficient to use only the idea of equal probability? The axiomatization referred to in § 0 
suggests that equi-probability is sufficient if we can analyse our alternative possibilities to 
tho point where they all stand on the same footing. If we can, as it were, break down our 
situation into atomic propositions with equal probability, a mathematical measure of the 
probability of a subset follows readily enough. 

H, At first sight this does not appear to be possible: The probability that a new-born 
ciild is male is (say) 051 but it does not look as if we can regard this as one of a subset of 



104 Reconciliation of theories of probability 

51 favourable propositions out of 100 equi-probable propositions. But I think this is not 
a fatal objection. We arrive at the probability of 0-51 by counting cases, and it is assumed 
in doing so that the probability of occurrence of a male in each case is the same; in fact, we do 
base our probability on equi-probable events, and the same is true of any probability based 
on statistical frequencies. I am not prepared to say that we can in every instance schedule 
the equi-probable events explicitly. General judgements in probability often have to be 
made on the basis of a ‘feeling of the situation’ which is compounded of a multitude of 
relevant factors half-remembered or not explicitly remembered at all. There is, as Jeffreys 
has pointed out, an element of uncertainty attributable to the imperfection of the human 
mind itself. I am, nevertheless, inclined to think that a satisfactory theory can be founded 
merely on the notion of equi-probability or the equivalent notion of randomness. But it 
is not necessary to insist on the point. What I do insist upon is the necessity for a primitive 
idea of probability or randomness of some kind. 

12. It might be thought that the differences between the frequentists and the non- 
frequentists (if I may call them such) are largely due to the difference of the domains which 
they purport to cover. I assert that this is not so. One of the principal modern advocates of 
the non-frequency approach is Jeffreys, who takes probability as a measure of belief and is 
concerned with the application of his theory to scientific inference in general. But practically 
every example in Jeffreys’s book can be treated by the methods of the frequentists (or so they 
would claim). The fact is that in practice both schools deal with the same kind of problem. 
They differ because they approach the same problems differently, not because they deal 
with different problems. 

13. The essential distinction between the frequentists and the non-frequentists is, I think, 
that the former, in an effort to avoid anything savouring of matters of opinion, seek to define 
probability in terms of the objective properties of a population, real or hypothetical, whereas 
the latter do not, and, indeed, sometimes go further and repudiate the introduction of 
a population as irrelevant, incompetent and immaterial. To revert to the example of the 
sex-ratio in births, the frequentist would, at the outset of the inquiry, postulate the existence 
of a probability p that a birth was male in some population or ‘Kollektiv’ and then proceed 
to estimate it in the light of experience. The non-frequentist would begin by assuming a prior 
probability for the ratio and would then modify it in the light of the posterior probabilities 
given by observation. If the observations are numerous his prior probabilities dwindle into 
insignificance and he gets much the same formula as the frequentist. But he is not, on the 
face of it, trying to do the same thing. The frequentist is estimating an unknown constant; 
the non-frequentist is determining a probability which is not a constant, but varies according 
to the state of his knowledge, 

14. Suppose I draw a random example of 1000 births and find that 600 are male. Do 
I then infer (since this differs significantly from what could happen, to an acceptable degree 
of probability, in sampling from a population wherein the proportion of males is 51 °/ 0 ) that 
the proportion of males cannot be 51 %? Most assuredly I do not. There is so much prior 
evidence in favour of a ratio of about 51 % as against a ratio of 60 % that I require much more 
posterior evidence than this to shake my prior probability seriously. My previous knowledge 
must affect my judgement. The frequentist cannot, I think, question this. He must then do 
one of two things, He must either admit that probability is influenced by prior knowledge 
if that probability is by itself to be the basis of a rational judgement or course of action; or 
he must concede that the final judgement is based on a mixture of probability and some 



M. G. Kendall 


105 


different quality of uncertain inference which conditions it. It seems to me that if he takes 
th e second course he lays himself open to the criticism that his theory fails to solve the 
oroblem with which he is concerned. One does not obtain an objective theory of inference 
hy separating it into an objective and a non-objective part and ignoring the second. Perhaps 
the frequentist can maintain that his theory is consistent and logical. He cannot maintain 
that it provides more than a part of the answer to the fundamental question. 

15 I shall revert to this point later m considering the statistical theory of confidence 
intervals. It is sufficient for the moment to record the point that in arriving at a probability 
of an observed event even the frequentist must make some assumption about prior pro¬ 
babilities. This may, in the last analysis, mean no more than that he has to postulate of his 
observations that they are conforming to a random process of specified type; but that itself 
is an assumption at least of equi-probability, which is an assumption concerning prior 
probabilities. I therefore assert that, to assign a probability in any practical case the frequentist 
as well as the non-frequeniist requires a prior probability distribution from which to start. 

16. This raises two problems: (a) how, in any given situation, we determine the prior 
probabilities, and (6) how we justify the contention that what we are doing has any 
application in real life. 

The precise determination of prior probabilities may be a matter of practical difficulty, 
but I do nob think this constitutes an objection. We may have a fair idea of the temperature 
of a room without being able to express our impression very precisely in degrees on a thermo- 
metric scale; and similarly, we may have a rough idea of measurable prior probabilities 
without being able to say exactly what they are. But if we pursue these general impressions 
back to their source, how do they arise? Not, I think, by any innate knowledge, if there is 
such a thing, hut from even more prior probabilities. The prior values which wo assign on 
any given occasion are themselves posterior values of a previous experience. Where did 
they start? 

17. It is difficult (but not impossible) for an adult to find any situation in which he has 
no prior knowledge whatever to influence his assessment of a probability. There may, how¬ 
ever, have been moments in his childhood when his ignorance was complete. This is not, so 
far as I can see, necessarily so because in the very act of learning the meaning of a proposition 
he may have acquired some expectation as to its truth. It is for the psychologist to explore 
these topics. For my present purpose I need only emphasize that prior probabilities are 
themselves built up from experience, albeit an unremembered and unconscious experience. 

18. Jeffreys and some other writers have endeavoured to meet the ‘situation of initial 
ignorance’ by laying down certain rules determining the types of prior probability distribu¬ 
tion to be adopted in specified cases. When Jeffreys’ s book, first appeared this seemed to me 
the least convincing part of his treatment, and it seems so still. In fact, my doubts increase 
as the rules proposed for different situations multiply, It cannot be necessary to adopt 
peculiar rules for prior distributions in order to say that we know nothing. As 1 understand 
him Jeffreys argues that his rules are necessary in some cases because otherwise there would 
arise mathematical difficulties. But this seems to me a circular argument, and more a ground 
for examining the mathematical treatment than a justification of the theory. In reading 
some of the papers in this vein one sometimes has difficulty in resisting a cynical suspicion 
that certain kinds of distribution are introduced because they give the right kind of 

answer, which may be a posterior justification but is unsatisfactory inlaying down the bases 
of a subject. 



106 


Reconciliation of theories of ‘probability 


19. I assert: 

{a) That the assignment of a probability to observed phenomena requires in all cases the 
assignment of a prior probability or some equivalent procedure such as the assumption that the 
generating process is random. 

(b) That situations rarely, if ever arise, in which there is no knowledge of prior probabilities. 

(c) But that, if such a situation arises, the only possible rule to use is that of Bayes in which 
all the possibilities are given the same prior probability. 

Let me add two scholia: 

(i) Some difficulties arise over the mathematical treatment when we are considering 
parameters which may have an infinite range or even when they are continuous over a finite 
range. These difficulties, I hold, are mathematical in the sense that ideas of continuity and 
limiting processes are mathematical. They raise problems, but the existence ol such problems 
is not very relevant to the basis of the theory of probability. 

(ii) As data accumulate the posterior probability dominates the total probability so that 
it makes very little difference what the prior probabilities were. This is an argument for using 
prior distributions which are plausible but inexact if they make the mathematics easier, 
at least for large samples. It in no way affects the points now under discussion. 

20. So far my summing up has been in favour of the non-frequentists on the points con¬ 
cerning the impossibility of founding the theory without some primitive idea of probability 
and the necessity of using prior probabilities. I must now restore the balance, and show that 
the non-frequentist requires notions of frequency in some form or other to make his theory 
of any practical importance if it is to he a numerical theory. 

The frequentist tries to give his probabilities objectivity by basing them on frequency of 
occurrence. The non-frequentist would also like to obtain objectivity, of course, but he is 
concerned with the measurement of an attitude of mind which is subjective. He may, as 
Keynes did, beg the question by speaking of degrees of rational belief as if rational minds 
could not differ in the assessment of a probability on the same evidence, which in the ultimate 
analysis implies that a belief founded on the same data is only rational if it agrees with one’s 
own. Jeffreys makes a better attempt, I think, by admitting that there is no logical answer 
to the solipsist but offering to believe in him on the basis of reciprocal aid. However, one must 
not make too much of this kind of point. We all assume that there are certain rules of thought 
which are common to all rational human beings. Let us then suppose that the rules of pro¬ 
bability are commonly accepted. Let us further suppose that on given data two diff erent 
individuals will arrive at the same assessment of prior probabilities. This is a very consider¬ 
able assumption to make because the ‘background’ of individuals may vary so much that 
we cannot be sure that the data can ever be the same; consequently each individual may 
possess his own schedule of probabilities, and we do not emancipate ourselves from matters 
of opinion. However, in order to press on to the main difficulty, let us make the assumption. 
We then arrive at the fundamental question: what use is it to calculate a probability? 

21. There is no point in calculating probabilities as an arithmetical exercise unless we 
are willing to relegate the theory of probability to the position of branch of pure mathematics. 
They must either be pure measures of a mental attitude or they must correspond to something 
in the external world. If they are only measures of belief, they may still form the basis of 
rational actions and rational decisions. But why are they of any use in this respect unless 
our decisions or actions are improved by them—in fact, unless we are more often right by 
using them than by ignoring them? The frequentist does not encounter this difficulty, for 



M. G. Kendall 107 

his probabilities purport to describe observed frequencies. The non-frequentist, it seems to 
me, needs a new basic assumption. 

[ assert that,for a non-frequentist theory of'probability to be applicable, it is necessary to assume 
that propositions with greater probability are true more often in fact than qmopiositions with less 
probability* 

22 Certain minds, including some types of trained scientific mind, will instantly revolt 
at any contention that there is a positive correlation between what we believe and what is 
true The scientist, very properly, is trained to suspect even his own beliefs and, remembering 
that whole populations have believed that the world is fiat or that heat is a fluid, is not even 
very impressed by an overwhelming body of collective opinion. And yet the number of 
instances in which people have believed something wrong is relatively small; and frequently 
the erroneous propositions which have been most widely believed are those whose meaning 
is in any case abstract and obscure. Every conscious movement I make is based on the 
belief that what has happened before will happen again. The use of language itself is a testi¬ 
mony to this belief, and anyone who denies it has to explain why he thinks that the sounds 
he utters or the marks he inscribes on paper in making the denial will be understood. 

23. I am not arguing that there is any less reason than in the past to seek for evidence, 
and as much of it as one can get, before coming to conclusions. I am only saying, first, that 
in many ordinary decisions we already have strong prior probabilities, and secondly, that 
we may have to take decisions on very slight evidence or on prior probabilities which only 
just favour one course as against the alternative; aud I ask, what grounds have we for sup¬ 
posing that our probabilities are a good guide to conduct 1 It is here, I think, that the 
frequentiat can assert himself; for the only grounds are that we have found in the past that 
it is so. Our reliance on our probabilities is based on the frequency with which we have found 
it justified in the past. This, I think, is the pragmatic frequentiat sanction for a non-froquon- 
tist theory. It works. 

24. The line of thought which leads me to suggest that a reconciliation of tho frequentiat 
and non-frequentist views is possible should now be clear. The frequentiat seeks for ohjectivity 
in defining his probabilities by reference to frequencies; but he has to use a primitive idea 
of randomness or equi-probability in order to calculate the probability in any given practical 
case. The non-frequentist begins by taking probability as a primitive idea, but he has to 
assume that the values which his calculations give to a probability reflect, in some way, the 
behaviour of events. Frequentiam furca expellas, tamen usque recurret. Neither party can 
avoid using the ideas of the other in order to sot up and justify a comprehensive theory. 
I believe that if this is firmly grasped the fundamental differences vanish. There may still 
be room for argument about the best way of presenting the subject from the logician’s or 
the teacher’s point of view, but that is quite a different matter. 

Foundations 

25. In laying down the collection of axioms, postulates and general rules-of-the-game 
which provide the foundations of a theory, we have a substantial amount of choice which is 
exemplified by the very different treatments given by different authors under the title of 

* In this artiolo I speak sometimes of probabilities of events, sometimes of propositions, sometimes of 
propositions that are true, sometimes of events that will happen. Such terms require definition and 
clarification in a highly rigorous exposition, but to have attempted such a thing hero would have obscured 
the main points I am making. I am satisfied that in omitting sucli a treatment I am not glossing over any 
difficulties which affect my main conclusions, 



108 Beconciliation of theories of probability 

‘Theory of Probability ’. Tlie mathematician is apt to skate rather lightly over the problems 
of relationship with experience in order to press on to the calculus, which is his primary 
interest. The logician goes more deeply into the correct axiomatization but sometimes does 
not proceed to develop the theory to the point of showing its practical utility. One cannot 
complain of this, for only an encyclopaedist can do justice to the subject in its entirety, and 
even a collective work like Borel’s Traite leaves large areas untouched. One can indeed 
complain of the lack of sympathetic understanding shown by some authors to the works 
of others; but it is unfortunately true that among the exponents of the science of objective 
judgement there seem to be as many emotional judgements as among less enlightened sections 
of the community, 

26. The very extent of the domain to be covered requires an author to be selective, but 
there is a serious danger that his selection may give a bias to his treatment. For instance, 
it is a fairly common practice to begin with the throwing of dice or the tossing of coins as 
illustrations of the ‘kind of situation’ with which the probabilist is to deal, as if the theory 
of probability were the same thing as the doctrine of chances. It is not part of my purpose 
to discuss such matters on the present occasion, but it is fair to point out that a teacher ought 
to be very careful not to put only one side of the case to his students. Admittedly he must 
start with simple ideas and not destroy their self-confidence by leading into the difficulties 
at the beginning of the subject—as in mathematics itself, the fundamentals are the most 
difficult and the beginnings should come last. But in reading most modern treatments 
I cannot help feeling that a little more impartiality would be an improvement, and that an 
author should do more than ignore opposing views or mention them merely to refute them. 

27. My discussion of the frequentist and non-frequentist viewpoints in §§3-24 has left 
me little more to say about fundamentals. I have only two points to make. The first concerns 
the introduction of probabilities associated with continuous variables. From the idea of 
the probability of events we can build up the idea of the probability of variate values, and 
hence the idea of a probability function in the discontinuous case. It is then tempting for 
the mathematician to jump to the continuous case without pausing very long for thought 
and to write such expressions as 

clF=f[x)dx , (1) 

which purport to express the fact that the element of probability dF in the range dx is 
f{x)dx. A variate transformation is then made by writing x — x(£), and we have 

dF=/{x(£)}gd£, (2) 

expressing that the probability in the range dE, is/{x(£)} dx/dg. 

The retention of the differential element is thus very important in the expression of a 
probability. It is well known that there is something arbitrary in the determination of 
probabilities in a continuum. But on occasion we ignore the differentials and concentrate 
on the frequency function/, as, for example, in methods involving the likelihood function. 
There are contexts where it is important to remember that likelihoods in this sense are not 
invariant under variate transformations, so that tests based on likelihood ratios contain an 
implicit assumption about the random process generating the observations. 

28. My second point is that in setting up a theory the non-frequentist has to take one 
hurdle which the frequentist by-passes. It is necessary for him to establish that his pro¬ 
babilities are measurable on a numerical scale unless (like Keynes and some other writers 



M. G. Kendall 


109 


on logic) he is content to see his theory so indefinite that practical application is confined 
within very narrow limits. Most authors are prepared to make whatever assumptions are 
necessary to permit their probabilities to be numerically measurable. There is evidently 
some latitude of choice in postulates for the purpose. The simplest course is merely to 
postulate that probability is measurable. Jeffreys starts from a slightly anterior point and 
requires the axiom that of two probabilities p and q, p is either greater than, less than, or 
equal to q. This puts his probabilities in order; he requires a further axiom that if pro¬ 
babilities are put in order they can be associated by a (1,1) correspondence with a set of real 
numbers in increasing order. This ensures that there are enough numbers for the purpose, 
and he then calibrates his scale by reference to the theorem that of n equally probable and 
exclusive events the probability of a subset m is mjn, and hence arrives at the theorem that 
any probability can be expressed by a real number. There may be other ways of arriving at 
the same result, but it appears unlikely to me that any of them could avoid making assump¬ 
tions equivalent to those of Jeffreys. 


Direct theory 

29. The direct theory of probability, as I am using the expression, consists mainly of the 
formulation of rules for building up from the probabilities of simple propositions the pro¬ 
babilities of more complex collections of propositions. So far as I know there are no serious 
difficulties or differences of opinion about the simpler rules such as those expressing the 
probabilities of the sum of propositions. Nor are there any problems, other than those of 
pure mathematics, once certain fundamental rules have been established. Such valuable 
and ingenious little books as Whitworth’s Choice and Chance contain many problems, 
but they are all essentially mathematical. There is, however, one rule which offers a stumbling 
block and merits particular attention, namely, the product rule which is usually expressed 
in some form such as 

P(pq | h) = P(p | h) P(q | ph) 

= P(q\h)P(p\qh), (3) 

that is to say, the probability of both p and q on data h is the product of the probabilities of 
p on h and of q on p and h (or, equivalently, of q on h and p on q and h). I find it a useful 
practice, whenever meeting with a new book on probability, to go straight to the derivation 
of the product rule. It provides a kind of touchstone for the author’s whole treatment. 

30. For the frequentist the derivation is simple. If, of a set of n equi-probablc and exclusive 
propositions k are favourable to p and q, l to p and to to q we have 

Jc _ Ik __ mk 

n nl nm' ' ' 

which establishes the proposition. There can be little doubt, I think, that the simple arith¬ 
metical identities of equation (4) are the basis of our readiness to accept the product-rule as 
generally valid in any theory. The idea of equi-probability or randomness contains within 
it the product-rule for a finite number of alternatives. There may be difficulties in dealing 
with limiting cases where infinities are involved, but I think they are far from insuperable 
and need not be brought up here to confuse the issue. 

31. The non-frequentist has a much harder task to establish the rule. Johnson [Logic, 
v °l. ^ h^ats it as an axiom, and his view is important because he influenced both Keynes 



110 


Reconciliation of theories of probability 


and Jeffreys. Keynes himself, 
of probable relations by 


I think, fell into error on this point. He defines the product 
P(p | g*) P(Q I A) = P(PQ 1 A), (5) 


where, it must be remembered, equation (5) is merely a rule for manipulating logical symbols, 
not the expression of numerical probabilities. Although there is some latitude of choice in 
which of our elementary propositions are introduced as definitions and which as postulates, 
it is repugnant to general expectation that an important rule such as the product-rule should 
be introduced by mere definition. I find Keynes’s subsequent attempt to establish a numerical 
theory of probability hard to follow, and I cannot see that in his theory the product-rule 
states anything more than that we may multiply numerical probabilities when we may, 
which is true but carries us no further forward. 

32. Jeffreys takes what I think is the only possible course and reverts to Johnson’s 
treatment by taldng (3) as axiomatic. 

Now the product-rule is not obvious. We are therefore entitled to consider why it should be 
introduced at all. The somewhat cynical reply that without it we could not get the answers 
we want is insufficient; we still have to explain why we want that kind of answer. Here, 
again, I think the frequentist comes into his own. The only justification for the product-rule 
that I know is based on the fact that it can be established for sets of equi-probable exclusive 
alternatives as in equation (4). I therefore assert that both frequentist and non-frequentist 
theories are compelled to rely, for the justification of the product-rule , on the properties of equi- 
probablc exclusive alternatives. We are, I hope, one stage nearer an understanding how the 
two approaches are interlocked. 


Inverse theory 


33. A precisian might reasonably contend that there is no such thing as ‘inverse’ pro¬ 
bability, but the term is so widely used that one cannot disturb it. ‘ Probabilite des causes’ 
comes rather nearer to expressing what we mean by it. The essence of the process lies in its 
attempt to reason from observation to the general law governing observation. In its broadest 
aspect it seeks to provide a science of induction. 

34. The non-frequentist can set up a theory of inverse probability, or perhaps it would 
be better to say a theory of inference, without a great deal of difficulty. Some frequentists 
would deny this, but I think the majority of the grounds of their denial would be found, on 
analysis, to relate to the more basic problems referred to in the earlier parts of this paper. 
It is, for example, legitimate for the non-frequentist to consider the prior probability dis¬ 
tribution of an unknown parameter (which may be a natural constant such as the mass of 
a proton) and to modify it in the light of experiment according to the relation 

posterior probability = prior probability x likelihood. (6) 

In this way the non-frequentist can proceed, by continued experiment if necessary, to con¬ 
centrate his probabilities in narrowing ranges, or in short, to get continually nearer to the 
truth; always q>rovided that he can obtain a prior probability to start with. 

35. It is in the problem of determining the initial probabilities that most of the modern 
controversy has centred. In fact, at least one statistical school of frequentists has felt the 
difficulty so acutely that much of their work on inference has been devoted to finding methods 
which avoid the use of prior probabilities altogether. In doing so they encounter troubles 
of their own which I discuss below. Even the non-frequentists, however, are not free from 



M. G. Kendall 


111 


troubles in formulating rules for the determination of prior probabilities. I have already 
referred to the point in § 18. The postulate of Bayes, that if nothing is known about prior 
^abilities they are to be assumed equal, appears to me, after many years’ reflexion, to be 
Yevitably right. When we are in a genuine state of indecision we ‘toss up for it’. But it is 
rarely indeed that we are in a complete state of ignorance about the alternatives we are 
considering. The mere fact that we understand what we are talking about implies some kind 
of knowledge. We may, indeed, be unable to set exact values on our vague appreciation of 
the prior position, but that is not really the point. 

36. In an endeavour to avoid Bayes’s postulate several modem writers, notably Fisher, 
Neyman and E. S. Pearson, have propounded methods which are widely accepted in one 
form or another by statisticians. The methods do not always agree between themselves, and 
in one famous case—the Behrens test—give different results; but they have the common 
object of attempting to escape from the necessity of using prior probability distributions. 
I shall discuss, in that order, maximum likelihood, fiducial inference and confidence intervals 
with the object of showing that there is, in fact, no escape except by the introduction of new 
assumptions. 

Maximum likelihood 

37. If a probability function f(x, 0) depends on a single unknown parameter 0 the. prin¬ 
ciple of maximum likelihood enunciates that to estimate 0 we should take that value which, 
for variations in 0, maximizes the likelihood of the observations, for instance, in a sample 
of n independent observations, the function 

L=Tlf(XpO)- G) 

I have discussed this principle on a previous occasion (1940). All I need say here is 
(a) that in large samples the method gives much the same results as the method of Bayes, 
as is emphasized by Jeffreys; and 

(/>) that the strong posterior recommendations of the method (many of which relate to 
large samples) do not obviate the necessity for recognizing that the principle of maximum 
likelihood constitutes a new postulate. 

38. The principle, in fact, is not obvious in the sense of being immediately and intuitively 
acceptable. It states that we are to proceed on the assumption that the most likely event 
has happened. Now, why? Even if we ignore any prior knowledge, for which the principle 
makes no allowance, and assume for the sake of argument that the most likely event is the 
most probable event (an assumption which appears to me to involve Bayes’s postulate), 
we know, or at least we believe, that the most probable event does not always happen. The 
justification for our principle can then only be that it leads us closer to the truth on the. whole 
than other methods. This appears to me to be another form of the postulate which is referred 
to above as necessary for the non-frequentist theory, namely, that what we believe is, on the 
whole, true. I therefore assert that the principle of maximum likelihood requires a new postulate 
which is , in the last analysis, equivalent to one of the assumptions required to validate the non - 
Jreqmntist theory. 

3J, Here I interpolate one comment of general application. In seeking for a satisfactory 
logical basis of modern methods, and in pointing out that some of them are not so firmly 
based as they seem, I am in no way trying to undermine the use of those methods. What 
am trying to do is to bring to light the latent assumptions on which they are founded. The 



112 Reconciliation of theories of probability 

same applies to the other methods which I consider below. I think it is desirable to make this 
statement, partly to disarm those writers who may jump to the conclusion that I am laying 
the axe to the roots of their work (which is not my intention), and partly because there is 
a danger that critics of the new methods may swing too far the other way, and in finding 
that there are legitimate grounds for doubt about the logical foundations, may be tempted 
to reject the methods altogether. English writers may not go to these extremes, hut the 
following extract from Pietra (1948) will illustrate my meaning. Referring to some recent 
publications by Gini on inverse probability, Pietra says (p. 70); ‘but there is no doubt now 
that [Gini’s] revision [of the basis of probability in statistics] has brought to light the fallacious 
character of the numerous illusions on which the great success of the Anglo-Saxon develop¬ 
ment of our subject is based.* An Anglo-Saxon may be pardoned for feeling a little nettled 
by this kind of statement; but it represents a point of view' which he would be wise not to 
ignore. 

Fiducial inference 


40. The so-called method of ‘ fiducial ’ inference also requires a new postulate. The results 
which it gives in some cases agree with those given by the theory of confidence intervals 
and by Jeffreys’s form of non-frequentist probability, but since it can give results which are 
not true in the former and is mostly advocated by those who repudiate the latter it can hardly 
be regarded as equivalent to either. The essence of the fiducial process appears to be this: 
if a frequency function of a sufficient statistic t and a parameter 0 is 


dF~ 


dF(t,0) 

~dt u 


the fiducial distribution of 6 is given by 

dF-- 


9 F{t,6) 
dd 


,id. 


( 8 ) 

(9) 


This is used to give a range within which the value of 8 may be regarded as lying to an 
acceptable degree of ‘probability’. 

41. The first thing to note about an expression such as (9) is that it has a differential 
element d6. To a frequentist this means nothing in terms of his probabilities because 0 is an 
unknown constant and has no probability distribution other than the trivial one of unity 
when 6 has the true value. The introduction of fiducial inference then implies not merely 
a new postulate about the behaviour of probability but a new kind of uncertainty. Exactly 
what this kind of uncertainty may be has never been explained, but it is not very unlike 
probability in the sense of degree of belief, at least in some contexts. 

42. Secondly, the transition from (8) to (9) is not by any means an obvious process, and 
I cannot myself see by what argument it is to be supported. At the least it amounts to a new 
postulate. Whether it is acceptable is a matter of taste, which perhaps is not worth discussing. 
The important point is that if one follows R. A. Fisher in rejecting Bayes’s postulate and the 
notion of prior probability in favour of the principles of maximum likelihood and fiducial 
inference, one is not making any economy in new concepts—in fact, the contrary. My 
personal feeling is that Bayes’s postulate is much more acceptable than the fiducial postulate. 
There is another complication in the use of fiducial inference which equally affects confidence 
intervals and I may as well proceed to it at once. 

* Ma b anche fuori di dubbio ormai che dalla revisione stessa e emersa la fallacia delle molte illusion) 
sulle quail fondava il suo maggiore successo l’indirezzo Anglo-Sassone della nostra disciplina.’ 



M. G. KENDALL 


113 


Confidence intervals 

4.3 The theory of confidence intervalsis an ingenious attempt to set up a method of making 
statements in probability without the use of prior probabilities or Bayes's postulate, and 
equally without the use of an alternative postulate. It acknowledges the impossibility of 
making assertions that a parameter (an unknown constant) will lie, to specified degrees of 
probability, within specified limits, but meets the situation by substituting for such limits 
random variables whose values can be observed. It is then possible to choose two random 
variables ij and t 2 , and to assert, with given probability of being right, that 

^ 0 ^ t 2t 

whatever and t 2 actually turn out to be when the observations are made. Since it is pos¬ 
sible, at least in some cases, to choose and t 2 to be independent of the unknown parameters, 
the probability of being right remains the same whatever values of 0 are actually encountered 
in practice. The method thus appears to be independent of any prior distribution of 0, and 
to be quite independent of any approach such as is embodied in Bayes’s postulate. 



44. Consider again the example referred to in § 14. Suppose I assume that a sampling 
process is such as to reproduce a binomial distribution—there is a good deal of evidence for 
this in the ease of births. I observe a value of 0-60 as the ratio of male to total births in a sample 
of 10,000. The theory of confidence intervals says that I may assert that the proportion p 
lies between 0-59 and 0-61 with the probability that, if I make this typo of assertion syste¬ 
matically in all similar cases, I shall be about 95 % right in the long run. But I do not then 
make such an assertion because I know too much about birth-rates to believe any sueh th ing. 
The theory of confidence intervals gives no place to prior knowledge of the situation. How, 
then, can that theory provide a guide to conduct in making decisions ? 

45. This difficulty arises in most of the accepted techniques of modern statistics which, 
while using judgements in probability, make no allowance for prior probabilities, I shall 
discuss confidence intervals to fix the ideas; but much the same considerations apply', for 
example, to the theory of testing Hypotheses, to fiducial inference, to the analysis of variance 
and to sequential analysis. The fundamental problem is to find some method of Incorporating 
prior knowledge or prior probabilities into the final probabilistic judgement of the situation. 
In order to consider it I need to recall briefly the nature of the theory of confidence intervals. 

46. The diagram of Fig. 1 is a familiar presentation of confidence intervals for the binomial 
distribution. 

Blometrika 36 




I Reconciliation of theories of probability 

For any given value m and fixed sample number n we may calculate the binomial distribu¬ 
tion (X + ej)» and, having decided on a confidence coefficient a, find a range of values from 
p Q to p v such that the proportion of the total distribution lying inside the range is a. (A minor 
question as to discontinuities in values of p 0 and p x I ignore as not affecting the argument.) 
Having further decided (as I shall for simplicity, again without affecting the argument) to 
take central confidence intervals, we may map the confidence lines L x and L z by plotting, 
for each value of m, the appropriate values otp 0 and p t . The confidence lines are, as it were, 
constructed horizontally by plotting the abscissae for selected ordinates. 

To use the diagram we read it vertically. For a given abscissa (observed value of p) we 
read off on the confidence lines the corresponding ordinates (values of tn), say (on i 2 ) 
and tn 2 (on Lfj. We then assert that 

( 10 ) 

with an assurance of being in the long run right in proportion a of the cases in which we make 
this type of assertion. 

47. From the way in which the diagram is constructed it follows that, whatever the 
frequency distribution of tn may be in the cases to which we apply the method, a proportion 
a of the happenings will lie in the confidence belt between the confidence lines; for the pro¬ 
portion is a in each horizontal elementary strip, and hence is so for the belt as a whole even 
if different weights are assigned to different strips. Now let us suppose that we know 
something about the prior probabilities of to; to take an extreme case I shall suppose that we 
know that m lies in a certain range, say M x to M 2 . If we adhere to the confidence rules we shall 
still assert (10) and shall still be right in proportion a of the cases. But nobody in his senses 
would assert (10) if the range to to 2 was outside the range M x to M t at any point. We shall 
continue to be right in proportion a of the cases if we assert an inequality 

( 11 ) 

where y x is the greater of up and M x and y 8 is the smaller of to a and M 2 . In short, we can 
modify and shorten our confidence intervals in the light of prior information without loss 
of accuracy. 

48. To take a slightly more general case let us now suppose that we have a known prior 
probability distribution of w, f{ra)dm which varies effectively in the range 0 to 1. We can 
still improve on the inequality (10) without loss of accuracy. We merely have to determine 
a domain in the square of Fig. 1 such that the total probability (allowing for variation of 
w as well as of p) shall be a and obeying certain elementary requirements as to connexity 
and convexity of the confidence belt. The situation is essentially the same as if, in deter¬ 
mining the confidence fine, we started with probability /(ro) (y+m) n instead of the second 
factor only. 

49. Provided, then, that we know the prior distribution f( to) we can incorporate it into the 
determination of our confidence belt without much difficulty. The essence of the method of 
confidence intervals is not that it takes /(to) to be unity, to comply with Bayes’s postulate 
(though that is its effect) but that it ignores f(m) and can still obtain accurate statements in 
probability. It obtains (in this case and all cases where ad hoc prior distributions are not 
introduced for special reasons) the same results as if Bayes’s postulate were applied. But to 
avoid the use of the postulate and at the same time to maintain rigour it has to sacrifice a 
great deal. It pretends, so to speak, either that there is no prior probability in the frequency 
sense which is being used to set up the confidence intervals, or that any prior knowledge must 



M. G. Kendall 


115 


1) blended with the results of the theory in some manner unspecified before the final judge- 
me nt is made. I suspect that the advocates of confidence intervals have often forgotten this 
f c fc-1 think that they often do not appreciate its importance; T am certain that they fail 
to bring it out adequately in their expositions. I therefore assert that the theory of confidence 
'ntervals offers only a ■partial solution to the problem of estimation, and that to provide a basis oj 
rational action it is necessary either to introduce prior probabilities into the theory or to find some 
new way of linking the theory ivith prior knowledge or prior expectation. 

50 Two further points require stressing in connexion with confidence intervals. First, 
it is part of the hypothesis that the observations are being generated by a random process 
We believe this, as a rule, on the basis of collateral evidence, i.e. on the basis of prior know¬ 
ledge I know of no convincing explanation why we are supposed to use this prior knowledge 
but to ignore any we may have about the parameter under estimate. Secondly, the method 
requires some theorem such as Bernoulli’s to the effect that if the probability of an event is 
p it will happen in proportion p of the cases in the long run. There has been a good deal of 
discussion and some misunderstanding about the role of Bernoulli’s theorem in the theorj 
of probability. Essentially it is a proposition in pure mathematics which may be expressed 
by saying that the proportion of total frequency in the binomial series (y + m) n in the neigh¬ 
bourhood of the mode nm tends to unity as n tends to infinity, ‘neighbourhood’ meaning 
within my fixed multiple of the dispersion f(nwx) however small. It is not a justificatior 
of the frequency theory of probability (as Bernoulli himself seems to have thought) in the 
sense that it asserts anything about the frequency of happenings of equi-probable alter¬ 
natives. No amount of mathematics, in fact, can assert anything about events which was 
not latent in the premisses concerning those events. Consequently, in relying on the resuli 
of the theorem (or anything equivalent to the effect that events of probability p will happer 
in proportion p of the cases), the theory of confidence intervals, like the theory of direct 
probability from the frequentist viewpoint, requires the basic assumption that the processes 
with which it is concerned do behave in the manner required by the theory; in short, thnl 
random processes do exist. 

51. It would be tedious to trace the same points through other statistical techniques sucl: 
as the theory of testing hypotheses. I will merely note that they occur in much the same form 
and, so far as I know, do not raise any essentially different problems. Not that there are nc 
other parallel problems in statistical theory—conditional inference and order-statistics are 
two examples of cases where further examination of fundamentals is required—but that the 
main points .have been covered in the foregoing. 

52, A friend of mine once remarked to me that if some people asserted that the earth 
rotated from east to west and others that it rotated from west to east, there would ahvayt 
be a few well-meaning citizens to suggest that perhaps there was something to be said foi 
both sides, and that maybe it did a little of one and a little of the other; or that the truth 
probably lay between the extremes and perhaps it did not rotate at all. I wish, in conclusion 
to emphasize that I am not attempting a compromise of this kin,d in endeavouring to reconcile 
the different theories which have been put forward by different authorities. It is not so much 
a question of choosing between viewpoints as of synthesizing them in order to get a complete 
picture of the whole. 



116 


Reconciliation of theories of probability 


REFERENCES 

An account of the various statistical techniques mentioned in this paper is given in my Advanced 
Theory of Statistics together with extensive references which I need not repeat, The specific works 
alluded to are: 

Cramer, H, (1945), Mathematical Methods of Statistics. Uppsala: Almqvist and Wicksell. 

Jefereys, H, (1939). Theory of Probability. Oxford University Press. (2nd ed. 1948.) 

Johnson, E. W. (1926). Logic. Cambridge University Press. 

Kendall, M. G. (1940). On the method of maximum likelihood. J. R. Statist. Soc. 103, 387. 
Keynes, J. M. (1921). A Treatise on Probability. London: Macmillan. 

KoLMoaoRorr, A. (1933). Orundbegrijfe der Wahrscheinlichkeitsrechnung. Berlin: Springer. (Reprint 
1946, by the Chelsea Publishing Company, New York.) 

Pibtra, G. (1948). Studi di Statistica Methodologica. Milan: Giuffre. 

von Mises, R. (1928). Wahrscheinlichkeit, Statistik und Wahrheit, 3rd ed. revised 1936. Berlin- 
Springer. (English translation, 1939, W. Hodge, London.) 



[ 117 3 


THE DERIVATION AND PARTITION OF X 2 IN CERTAIN 
DISCRETE DISTRIBUTIONS 

By H. 0. LANCASTER, M.B., B.S., B.A. (Sydney) 

Rockefeller Fellow in Medicine 

Summary 

1, (1) It is shown how the general term of any multinomial can be reduced to a series of 
binomial terms, to each of which corresponds a value of y 2 for one degree or freedom. In an 
accompanying paper (pp. 130-4 below), Dr J. 0. Irwin (1949) has shown that corresponding 
to this reduction there is an exact partition of y 2 and a certain Helinert matrix. This partition 
can be formally related to regression analysis by showing that it is equivalent to the selection 
of variables of the form x lJk ,.Jcr ijk ^,. 

(2) The expression for the probability of an r x s contingency table can be partitioned into 
the product of the probability of {r- 1) (s— 1) fourfold tables, the y of each of which is 
uncorrelated with that of any of the others. Asymptotically, when the expected frequencies 
are large, all the y’s are normally distributed so that we have (r -1 )(.■?- 1) normal and un¬ 
correlated deviates. Any difficulties as to degrees of freedom are avoided in this proof. 

(3) It is further shown that corresponding to the method of treating the rxs table set out 
in (2), there is an exact partition of y a which can be obtained by pre- and post-multiplication 
of the (rxs) matrix of standardized variables by certain Helmert matrices, This operation 
makes the variables in the first row and in the first column all zero, leaving a matrix with 
(r— 1) (a— 1) standardized and uncorrelated variables. Each of these variables is the y of 
one of the component fourfold tables, when calculated by the use of expectations obtained 
from the original margins and not from the component table itself. 

(4) Numerical examples are given for the case of a multinomial distribution, for the four¬ 
fold table and for a 3 x 3 table. In each case the partition of y 2 is illustrated. 


The binomial and multinomial distributions, contingency tables and y a 

Introductory 

2, Proofs of the distribution of y 2 in contingency tables have been available since the now 
classic articles of Karl Pearson (1900), R. A. Fisher (1922,1924) and G. Udny Yule (1922). 
However, the specification of n standardized variables corresponding to the individual 
degrees of freedom, especially those corresponding to comparisons of practical interest, has 
often raised difficulties. Sometimes we may be interested in a partition which is only 
asymptotically exact for large samples. For instance, in the example quoted by Fisher (1944) 
of Weldon’s dice throws, the binomial (§ +1) 12 was fitted to 26,306 throws of 12 dice. A test 
showed that the dice were biased (y 2 = 35-491 with IOd.f.); but when the value of p in the 
binomial was estimated from the data a satisfactory fit was obtained (y 2 = 8-179 with 9d.i<\). 
The difference is 27-312, and this is due to the difference between the estimated and theo- 



118 


Partition of y 2 in certain discrete distributions 

retical values oip. In fact, the ratio of (p - f) to its standard error is 5-20, the square of which 
i 8 27-04 and represents y 2 for 1 n.F. This partition is only approximate, but the equation 

P(fiJ 2 .--Jn\P) = P(fi,U •••>/« \P)P{P | P) ( 1 ) 

suggests that the partition may be asymptotically true. In fact, an application of Stirling’s 
formula shows that (1) is asymptotically equivalent to 

(27r)- lJi exp (- Ixl) = (27r)-*<"~« exp (- jxn-i) * exp (- ( 2 ) 

where y", Xn-v Xi ftre X 2 with n, (n -1) and 1 d.f. respectively, and n = 10 in this case. 

In the present paper it is shown that the multinomial can be expressed as a product of 
(w - 1) independent binomials. Stirling’s theorem is then applied first to the whole distribu¬ 
tion and secondly to each of these binomial expressions. In the first case the usual value for 
X", namely, E(a; — mfjm, where x is observed and m the expected frequency, is obtained. In 
the second case we obtain (n — 1) normally and independently distributed variables with unit 
variance and zero mean, the explicit forms of which arise naturally out of the algebra. By 
equating the usual y 2 to this second sum, it is clear that y 2 is distributed with (n — 1) degrees 
of freedom. This derivation is then generalized to show that y 2 for an (r x s) contingency table 
can be partitioned into the y 2 ’s of (r- 1) (s- 1) fourfold tables with 1 n.F. each. 

These partitions of y 2 are only asymptotically exact, but Dr J. 0. Irwin has shown that 
they can be made exact, for any size of sample, for the multinomial distribution and for 
contingency tables, by a slight modification which amounts to finding a ma,trix of Helmert’s 
type to transform the standardized variables, (observed —expectedj/^fexpected). 

Notation 

3. Multinomial. p i will refer to the probability of an observation belonging to the ith 

n 

class (i = 1,2,The number of observations falling into ith class will he eq and 2 % = a. 

The multinomial will thus be (Pi+p 2 + ■■■ +P n ) a - 

Contingency tables. A similar notation will be used for contingency tables. p {j will be the 
probability of an observation falling into the class in the ith row and jth column where 
i= 1 , 2, ...,r\j = 1 , 2 ,..., a. a {j will be the observed number of observations in this class. 

We define p = Zp ijt p . = ^p.., (3) 

1 £=1 

and similarly for the a ip Further 

2P,\ = l,P.i = Z,Ph = 1, 

i i i.j 

~ 2°by ~ 2 — a. 

i j i.j 

We also write R ik = 2 C tj = 2 a ip T u . = 22 <%■ 

i=l i=l j—1 



The derivation of y 2 from discrete distributions 
by use of Stirling’s approximation 
4. In the multinomial distribution (p t +p 3 +.., + p n ) a 

P(a i | a) = a! n(^| = ± gg-tH_ 

W/ (Pn-t-Pn~i) a ' ,+a ’‘'a n !a n _ l ! 

x general term of ( Pl + p 2 + ... + (y„+p n _ 1 }) < ‘. 


(5) 



H. 0. Lancaster 


119 


So that it is easily seen that P(a t \ p { , a) can be expressed as the product of the general terms 
of a binomial and a multinomial of one fewer terms. This process can be continued, and 
gives us the following: 

Theorem (asymptotically true). Any general term of a multinomial expansion can be 
written so as to represent the product of (n- 1) binomial terms. Each of these terms can be 
represented by an expression of the form 

{l/<r i V(27r)}exp(-Jyf). 


They- are uncorrelated. With large expectations, the y £ are normal^ distributed and hence 

u-i 

independent, Then S' y’f will equal y- as usually determined and will therefore he independent 
of the order of breaking up of the multinomial and will be distributed as y 2 for (n - 1) degrees 


of freedom. 

Proof. The first statement follows from the remark at the beginning of this section. That 
the Xi s are uncorrelated follows from the fact that at any stage the residual multinomial 
does not depend on the factors a v a 2 , etc., already isolated in determining the individual 
terms. 

As a -?■ oo the as defined will tend to normality in virtue of the properties of the binomial. 
Since they are uncorrelated, they will then be independent. P{a i j p ; , a) lias now been reduced 
to the product of ( n - 1) binomial expressions, each of which may be reduced by Stirling’s 
approximation to the form i 

(») 


where of = (a lt +a„_ t + + ••• +7»«-f+i)/(P« +Pn-i + ••• +^n-/)® 

(7) 


fi, This exact partition can be shown to be equivalent to finding (n -1) uncorrolnted 
variables from Karl Pearson’s (lfiOO) correlation matrix, when a is indefinitely large, by 
using the methods of multiple correlation. We may use the notation of multiple correlation 
and taken correlated variables w L = (ap £ - - Ui)lj{upi( 1 - p £ )}, each of which lias an expectation 
zero and unit variance. Then the total and partial correlation coefficients, the variances and 


regression coefficients are given 

= ( 8 ) 
r H.kv..m = -Pi-(Pt+Pi+‘~ +J>«)}{1 -Pj-{Pk + Pi+-+Pm)}h (0) 

= 1 . ( 10 ) 

(1 1 ) 

m = 0 - (Pf+ Pj+ ■ ■• + P m )}l{l-Pi){l~(Pj + P k +---+ Pm )}, (12) 

= - VK 1 ~Pf)PlPjIV ~Pi) i l -Pj~(Pk + W + • ■■ + p,,,)} 2 ]- (13) 


It follows from the theory of normal correlation that uq/oq, w 2 ,/cr, lf iv a ' a l<r 3 ' U , etc. are all 
asymptotically normally distributed with zero mean and unit variance and that they are 

uncorrelated, and further it is easy to see that w n x a .= 0, so that the first {n- 1) of 

these standardized variables form a series of (n- 1) y’s and so give a partition of y 2 for the 
multinomial. It is easily verified that they give a series identical with our exact partition 
of y . This analogy can be extended to the case of the general contingency table. 



120 


Partition of / in certain discrete distributions 


Manifold contingency tables treated by / by an 

EXTENSION OF THE FOURFOLD TABLE 

6, Yule (1922) suggested that (r-l)(«-l) independent comparisons could be made in 
the rxs contingency table but did not give a proof that each corresponded to an independent 
value of/. It is the purpose of this section to show that every rxs table may be reduced to 
( r _ 1 ) ( a _ 1 ) fourfold tables; the value of y in each fourfold table will be independent of the 
other tables; these / will be summed to give y 2 for (r - 1) (s - 1) degrees of freedom; finally, 
we shall prove that this value of y 2 is unique and equal to that derived by the formula 
(observed - expected) 2 -e expected, summed for every cell of the rxs table. 

7. We shall first prove that a 2 x 3 table can be reduced to two fourfold tables and a 3 x 3 
table can be reduced to four fourfold tables. We shall then extend this by induction to r x s 
tables. 

We take as the null hypothesis that there is no association between the probability that 
an observation should fall in any row and in any column, i.e. 


Pv = Pi.P.j- 

It is easily shown that the probability of the 2x2 table may be reduced by means of 
Stirling’s approximation, so that 


P KK..«.))' 


. / 


4 


where 


^[2n)/s] 4 ^ 2 , i ^, 2 1 

A = a n ~a l _a A ja. 


exp - 


A 2 


2 l ^, 2/ 


(14) 


The case of the 2x3 table 

It follows that when there is no association we may write 

P(*ii | Pij) = “! n (f~) (i = 1,2; j = 1,2, 3 ) 


a\ 


a x !a 2> ! 


pl\pV: x 


a! 

a j!a. s !«. 3 ! 


PTP.i'PJ- x 


a! II%! 


(15) 


i, i 


Thus the row and column totals are sufficient statistics for p { and p 3 - respectively. Further, 
we have P(a ij \a,p ij ) = P{a l \a,p i ).P{apa,p j ).P{a ij \a u ,a_ j ). 


Hence 


Ph i fr , i ai.!a 2 .!a.i!a. a !a. 3 ! 

P\ a u a i.> a .j) - - 


(16) 


and, summing over all a {j , £ P(a i} ( a i% ,a J = 1. 

By a rearrangement, 

P{a ij | a it a j) = _gk ! gi. | a 2 , ! fl. 3 ! (17) 

a u! a ia • a 2 i • a 22 ! ^22 - ^ 12 ! -^ 22 ! a 13 ! a 33 ! a! ’ 

the two terms on the right-hand side being the probabilities corresponding to the fourfold 
tables 


( 18 ) 


®11 

a n 

P-12 


*13 

*1. 

°21 

*22 

■^22 

P 22 

*23 

^ 2 . 

<hl 

«.2 


I22 

*3 

.a 



H. O. Lancaster 


121 


jf y 2 are standardized normal deviates derived from these tables by Stirling’s approxima¬ 
tion we see that Xi depends only on redistribution of observations in the first table with 
fixed marginal totals and hence is independent of y 2 in the second table. Then y 2 = yf + Xl 
may be calculated and has 2 d.f. When we come to the general ease we shall prove that this 
, 3 f as usually calculated within the limits of our approximation (Stirling). 


The case of the 3x3 table 

najn*#i 

P(a i} | a.,) '= * a r n i. r Where i=1 ’ 2 - 3 ’J = 1 > 2 ’ 3 

_ Rtf! Rjf f-'2X- f^22' ^22' ^23' a l,- 

a ix- ®12' a 21' ^22- -I 22 ' -®12- 1^22' ®13' ^23' -^ 23 ! 

T aa !J?3 a !a iia a ! 3 iy 23 !a 3 ! . . 

C4i! C/22! a 31! a 32 ■ -^32 ■ ^22! Cji3! R s2 1 O 33 1 11 ’• 

which may be represented in a schema as follows: 


a n 

a 12 

R \ 2 

flu 

a is 

a i. 

a 31 

a 22 

■^22 

r 22 

a 23 

«2. 

c\ x 

A 2 


T t 2 

C n 

T n 


b 2 i 

<4* 

T 
j- 22 

m 

-* 22 l> 23 

T 

m3 

a 31 

a 32 

R 32 

i? 32 

«33 

«3. 

a.i 

a .2 

T 

J 32 

i 

1 ea 

! —« 
iri 

f 

a .3 

a. 


It is easy to see that a similar extension is possible to r x s tables. We see above that we have 

f = 4 

four independent values y l5 y 2 , y 3 , y, and y 2 = 2 xt w ith 4 u.k, 

;= 1 


The general case of r x s tables 


Theorem. Any rxs table can be reduced to (r — 1 )(s — l) independent fourfold tables, in 

(r-l)ls-l) 

each of which we can derive a value y,. Further £ y 2 is asymptotically equal to y 2 as 

usually calculated and is unique and equal to (observed-expected) 2 -f-expected, summed 
for every cell of the table. 

The proof is by induction. We suppose it true for rxs tables and show it true for 
(r +1) x s tables: 


P( a ij | a i.’ a .j) = 


t i 

na y ! a! 

i. 3 


(i = l,2,...,r+l-,j = 1,2,...,5) 


rFa,/n cy y r > r+ 1 ,.!na,.! 

" iTay! T ts ! 'n Gri ! ? ff a r+1 , fal 

i = 1 , 3=1 3=1 3=1 


(21) 



122 Partition of x 2 in certain discrete distributions 

The first factor can be broken up into (r — 1) (s — 1) independent fourfold tables by hypothesis, 
and the second can be broken up into (s — 1) according to the scheme 


C* 

C r z 


T ri 

c; 3 

T r3 

Cl r+l, 1 

*H-1,2 

P r+ 1,2 

•®r+l, 2 

a T+X 3 

1,3 

a j 

a 1 2 

P r+l, i 

P T+1 ,2 

a. 3 

^r+1,3 


and so on to 

n.-i 


T 


P r+i,s-i 

®r+l,s 

a r+l,. 




a 


( 22 ) 


Hence the (r+ 1) x s table can be broken up into (r— 1) (s —1) + (s- 1) = r(s- 1) fourfold 
tables. Therefore, if the first part of the theorem is true for r rows it is true for n -1 and 
similarly for columns. Therefore this holds generally, since we have proved it for 2x3 and 
3x3 tables. 

Thus we have proved that P(a w | is equal to the product of terms of form 

P(ity | %,«,■), where «y are the four frequencies in the cells of a certain fourfold table. 
We have seen that / u i 

(23) 

and this term can be equated to a certain integral between certain limits and that finally as 
an approximation we may say that is normally distributed. Since in each fourfold table 
the row and column totals are fixed, each x% is independent of the x of every other fourfold 
table in the set because the mean value of each y 1t is zero for given marginal totals; hence the 
proof follows as in the case of the multinomial. 

Since the y are uncorrelated and in larger samples may be taken to be normally distributed, 
they are also independent. Thus = y 2 for (r- 1) (s~ 1) degrees of freedom. 

But Stirling's approximation applied direct to the P{a jj \a i ^ a f by means of the sub¬ 
stitution a w = a { aj .a + gy, gives 

P ( a U I a .i) = (2w)_J<r ' 1,(S-1) (n^]5fn]7^) texi) {“iS «.<)! 


= £ 2 exp!-jE^j«/( a i. a .j) > 

\ i,j 


(24) 


where K 2 is of order {(?•— 1) (s — l)}'t in the a {j . We note further that x% * s of 0{a. i f) and 
£1i a l( a i. a .j ) a,so of 0{a ij ). 

It now suffices to equate 

^iexp(-iS^) = W 2 exp|-f a.})| 

to obtain an identity true within the limits of the approximation, namely, 

= 2&»/(«*.« ,)• 


(25) 


(26) 


1,3 


8. By the substitutions 
on the left-hand side and 


Analysis of % 2 in manifold tables 
«ij = a Pn + tij 


a i. = a .j = a P.j + i.j and Oy = a { aja+h 



H. 0. Lancaster 


123 


(27) 


(28) 


on the right-hand aide and then the use of Stirling’s approximation, we may reduce the 
equation P(cfy | p i} , a) = P{a u \p i ^a)P{a^\p_ p a)P(a ij \a u ,a_ ] ) 

to the form K 3 exp (- £x 2 ) = K i ex P ( - ixf) exp ( - Ixi) exp (- fog), 
vyhere K p K t are constants of the same dimensions in the a ir 

$ = y 2 due to rows = Z(a i -ap i ) z jl l ap i ) 1 

X* = x 2 due to columns = S(a.,- - apj) 2 j{ap f), 

Xt - X 1 due to ‘interaction or association’ = aL(a i j~a i a_ i ja) 2 j{a i a^), 

X 2 = yf due to all causes of variation = 2(a i3 - - ap i p _ ffliapi^pfo 

From (27) it follows that X 2 = Xr + Xc + Xe> 

with corresponding degrees of freedom 

(rs— 1) = (r —1) + (s — 1) + (■?•— 1) (s— 1). 

We see now why the degrees of freedom are reduced when the and p 3 - are estimated from 
the data since, if p t , and p. } are estimated as a^/a and afoa respectively, yfj, and xl hotli 
vanish. P{a tj \a it a } ) is independent of thep { _ and p i and hence ,\; 2 is minimized by choosing 
p, anc i p . so that xl and Xc vanish. This is also the maximum lilceliliood solution. 

The, exact partition of x 2 for manifold contingency tables 

9. The partitions of x 2 which have been obtained are only approximate, tending to 
exactness as the size of the sample is indefinitely increased. We now put Dr Irwin’s solution 
(pp. 130-4 below) for the exact partition of yf into matrix form, giving a transformation of 
the standardized variables arranged in an (r x s) matrix to an (r x s) matrix which has the 
elements of the first column and of the first-row zeroes. 

Helmert matrices for rows ( r x r) and columns (s x s) using p t , p jt are constructed thus: 


(29) 


(dj) 1 


ft = 


<]Pi. 

Pi. 

Pi.+Pz. 


'JPi. 

Pi. 

P1.+P2. 

etc. 


XPt' 


(oy) = O = 


V^.i Vi'.2 Vi*. 

etc. 


We also write 


_ q _ a Vii \ 


(30) 


(31) 


v \\H a Pij))' 

We again take as our null hypothesis, p ti = pi.p.y 

Consider the orthogonal transformation RQ which gives rise to an (r x s ) matrix of 
standardized variables. 

Then Q'R' is an {s x r) matrix of standardized variables. 

GQ'R 1 gives rise to an (s x r) matrix of standardized variables. Hence (O(RQ)')' is an (?• x s) 
matrix of standardized variables, i.e. RQC' = E - (e, 3 -) is an r x s set of standardized vari¬ 
ables. 

We can now prove that RQC' has zeroes in the first column and first row if the data are 
used to estimate p u = a t Ja, p , 3 = a fa, and conversely, that to make the elements of the 
first column and row zero we must estimate and pj by these two formulae. For 

e u = = 2r lk q kl c v = E V(ftJ { a ki~ a Pk.P.i) yj[Pj)N(ap a ) = 0, (32) 

k, l k.l k,l 

*« = SV(PfcJ (%- ap k p A ) %/ N /(ap w ) = ’Lc jl (a_j-ap [ )IQ{ap j) = 0, if p , = a Ja, (33) 



124 


Partition of f w certain discrete distributions 

Similarly e n — ® ^ Pt. ~ a i.l a ‘ (34) 

Conversly if all = 0 we have 

0 = e y = l l c l ,J(p k ) (a k ,-a Pk /pj)/f{ap kl ), (35) 

k, l 

o = SCji(a.( - a-P.i)N(P.i) = (36) 

where wfy = («./—<^. 2 )/V(? > . ;)• 

But this is a set of s equations in s variables and (7=1=0, so that m, = 0 or p , = a fa. The 
matrix E is thus of form 

s columns 


r rows 


'0 

0 

0 

0 

0 


0 

e 22 

e 32 


0 

e 23 

e S 3 


0 


0 1 


( 37 ) 


The first transformation RQ ensures that the members of any given column are uncorrelated. 
The second transformation RQC ensures that the members of any row are uneorrelated. 
Thus all e H1 not zero are uncorrelated. We have therefore a set of (r — 1) (s— 1) uncorrelated 
normal deviates with mean zero and unit standard deviation. Further, 


X 2 = £ (observed-expected)’^ expected 

= (* = 2,3, j = 2,3, (38) 

i.i 

where Pi _ — a { Ja, pj = a, fa have been estimated from the data. This is in effect the 
theorem on p. 291 of the article of J. Neyman & E. S. Pearson (1928). 

10. The author’s solution of the (r x s ) contingency table consists essentially of using a 
double Helmert transformation and testing out successive fourfold tables for ‘interaction’, 
or ‘association’ by means of estimates of p. it and p j derived from local row and column 
totals, instead of those derived from the marginal totals of the whole table. Any comparison 
involving row or column totals in these fourfold tables is set aside to be considered in the 
adjoining table. 

If we use the marginal totals of the whole table, tire partition can be made exact as is 
shown in the proof above. A numerical example is given later which illustrates these points 
in detail. 

11. In the special case of 2 x 2 table we have 


= ( a i]~apij)N( a Pij), Pi,- = Pi.P.r 

Transform (q if ) as follows: 


r fpi. 

*JPz7 

n?u ?i2 rvp.i 

1 

%- 

to 

_1 

r~ 

i 

to 

fPi.. 

L?21 922- LVP.2 

Vp. J 


2n fPu + Sh fPn + (?i 2 fPiz + <izz VlP 22 > ~ 2n fpiz ~ <? 2 i fpzz + ?i 2 fPn + ( hzf'Pzi 

L-Su V? 2 i + ? 2 i fPn ~ffia fPzz + l hz VPr 2 , ( hi fp 22 - ? 2 i VP 12 ~ <h 2 fPn + q 22 fpu_ 


= matrix 


of x’s 


rTotal 

l_rows 


columns 
‘ interaction ’ 


( 39 ) 


( 40 ) 

(41) 



125 


H. 0. Lancaster 

In this case X (total) = 0, since we made a = 

^(rows) and y(columns) become zero if we take p t = a t Ja, p j = a^a respectively. 

We are reduced to the single ‘interaction’ term for 1 d.f. Alternatively we may use the 
matrix to find Pt,p.j such as to render y (rows) and y (columns) zero. The solutions will be 

p i = a i Ja > Pj = ctjia respectively. 


Examples of the partition of y 2 by several methods 
Example 1 

There were available data from a random sampling experiment which had been carried out 
for another purpose, comprising the totals of samples from Poisson populations with means 
15 and 30. These were totalled for consecutive sets of 200 and 100 drawings respectively. 
These data are used because we are able to give theoretical values independent of the observed 
frequencies to the p im and p jt and the frequencies are sufficiently large to illustrate the 
approximation of the different sets of y 2 calculated by the different methods. 

We have analysed this table in three different ways. 

I (a) The first method treats each frequency as a member of a Poisson population with 
mean 26,974/9, that is, 2997-11 and S.D. ^2997-11. 

1(6) We have calculated the standardized variables q i} = 3(%~Fa)/V a and arranged 
them in the Q matrix. We have calculated the matrices for rows and columns, R and C of 
Helmertform, using the theoretical value of | for the p t and p r Thus in this special case 


"J. 1_ 

V3 V3 


R = 0 = 


— _J_ 

^2 V 2 

_1 1 

a /6 V 6 


_1 ' 

V"3 


0 



(42) 


Since the pre- and post-multiplication with such an orthogonal matrix leaves the sums of 

squares invariant, we obtain a matrix E such that 2 e% = X 2 - e n = 0 since we have used the 

ij 

total number in the calculation of the expected in individual cells. 

II. We have computed the usual y 2 = (observed —expected) 2 — expected, using the 
marginal totals to estimate p i and p j for the calculation of the expected. With these same 
values for p { and pj we have computed the matrix of standardized variables Q and the 
Helmert matrices R and C. 

III. Fourfold tables have been made according to the method used in the proof of §7. 
X 2 has been computed using the marginal totals of each of these fourfold tables so that the 
estimate of p x differs from table A to table B. We have taken the square root of y 2 to obtain y 
and given it a sign according to the sign of (observed — expected) in the top left corner of the 
same table, It is then easily seen that these four values closely approximate to the four 
values of corresponding to ‘interaction’ in the other two y matrices, thus giving a numerical 
verification of the asymptotic equivalence of the three methods of computing. 



126 


Partition of y 2 in certain discrete distributions 


The 3x3 contingency table (random sampling data) 


( Pij ~ 

1 /9 for all i , j. 

Expected total 27,000) 


3009 

2832 

3008 


8849 

3047 

3051 

2997 


9095 

2974 

3038 

3018 


9030 

9030 

8921 

9023 

| 26974 


Method 1 (a). Analysis of y 2 by usual method with p t _ = Pj = |. 



D.F. 

X 2 

Identification with 

E below 

Rows 

2 

3-61457 

Se?i 

Columns 

2 

0-82798 

Se?,- 

Residual 

4 

7-42107 

r 22 + ^23 + C lz 4- G 33 

Total 

8 

11-86372 

S e% 




hi 


Method I (6). 


Q, Matrix 

- 0-217165 
0-911281 
0-422153 

-3-015955 

0-984346 

0-746885 

O-198899-l 
-0 002030 
0-381561J 

II 

O 

II 

05 

- 0-57 7 350 
0-707107 

- 0-408248 

0-57 7 350 
-0-707107 
0-408248 

0-577350-1 

0 

-0-816496J 

BQ = 

r 0-407778 
-0-490814 
_ 0-805372 

-0-741735 
-2-828641 
- 1-439229 

0-333957-1 

0-142078 

-0-231172J 

B = iiQC' = 

r 0 

-1-834459 
i_-0-499424 

-0-812829 
1-653094 
- 1-587173 

-0-409012-1 
1-471167 
- 0-070020_ 


It will be noted that in matrix RQ the sum of the squares of the elements of the first row is 
equal to the y 2 in the table for columns. 

Method II. The value of y 2 = h(x~ m) 2 lrn. when the expectations are computed from the 
marginal totals, is 7'54:658. The Helmert matrices for rows and columns, when the y> L and 
p.j are estimated from marginal totals, are as follows: 


C ~ 0-578590 

0-576088 

0-578366U 

0-704957 

-0-709250 

° 

L 0-410206 

0-407723 

-0-815777J 

B = r 0-572762 

0-580669 

0-578590-1 

0-711937 

-0-702243 

° 

L 0-406311 

0-411920 

-0-815618J 

Q = r 0-867076 

-1-748555 

0-881236-1 

] 0-041607 

0-784907 

-0-822082 

L — 0-890199 

0-943216 

- 0-047324J 

Sum of squares 

= tabular y J = 7-54668, 





H. O. Lancaster 


127 


RQ= r o o 

0-580906 - 1-796060 

L 1-091440 - 1-156441 

Sum of squares = 7-54657 = X 1 . 

E = RQC'= r 0 0 

0 1-683409 

L 0 1-589624 

= the sum of squares = 7-54657 = xl- 

The transformation R applied to Q treats each column as though it were a separate 
multinomial. The two successive transformations of Q are both orthogonal and so leave the 
sum of squares invariant. The estimates of p { and p j are efficient (i.e. those of maximum 
likelihood) and so make the marginal elements corresponding to the row and column totals 


0 -i 

1-204680 

0-058022_ 


0 1 
-1-476733 
-0-071126J 


■zero. 

Method III. The formation of four fourfold tables (from which y 2 is calculated in the usual 
manner). This partition is only exact asymptotically. 


3009 

3047 

A 

2932 

3051 

5841 

6098 

5841 

6098 

B 

3008 

2997 

8849 

9095 

0056 

6883 

11939 

11939 

0005 

17944 


C 



D 


6056 

6883 

11939 

11939 

6005 

17944 

2974 

3038 

6012 

0012 

3018 

9030 

9030 

8921 

17951 

17951 

9023 

26074 


We may write for convenience in the form of a matrix 

X 2 = p-86004 2-180271 

[2-52637 0-00506J 

and corresponding to these values of 

y=R-691 — 1-477“} 

[l-589 - 0-071J 

The partitions of the x 2 of the contingency table by the above three methods evidently give 
approximately the same results. I and II are exact on the hypothesis made. 


Example 2 

The partition of y 2 is useful in the following type of case which arises frequently in bac¬ 
teriology. 

Measured constant amounts of a liquid suspension of a bacterial culture are mixed with 
an equal quantity of disinfectant solution of known concentration, and a plate is ‘poured’ 
and die number of colonies developing are noted. For each plate the concentration of dis¬ 
infectant used is given by some series such as l,r,r 2 ,r 3 ,..., where r is some factor such as 
2 or 1-5. In such a case the following results might be obtained, 

Number of colonies ( a t ) developing in successive plates 427, 440, 494, 422, 409, 310, 302. 
We are interested in finding the point at which the disinfectant began to inhibit growth, 
t is convenient now to write the cumulated sums 

427, 867, 1361, 1783, 2192, 2502, 2804, 



128 


Partition of y 2 in certain discrete distributions 

and the differences 4- + ... + a*_t - (fc -1) %}, —13) —121, 95, 147, 642, 690. Then we 

may illustrate the partition of X 2 by means of a table where the successive y® of the exact 
partition are given by 

Xl = +« a + - + a k _ i - (&- 1 )a k } 2 l{ak(k - 1)}, 

and the y 2 of the binomial partition are given by 

Xl -{ a i + Oj + • • ■ + ^h-i ~{k~ 1 )«'fc} 2 /(oi + a. a +... + a fc ) (k - 1). 

The total x 2 may be obtained by the usual formula 

= Hi(x — x) 2 jx = — Sk 

= 73-474 for 6D.S 1 . 


Binomial 

partition 

Exact 

partition 

Value 
of k 

0-195 

0-211 

2 

5.379 

6-092 

3 

1-687 

1-877 

4 

2-465 

2-697 

5 

32'947 

34-298 

6 

28-299 

28-299 

7 

70-973 

73-474 



The comparison in each case is that between plate k and the plates preceding it. In those 
cases where the null hypothesis is not true the discrepancy between the two methods will 
be high. Gumbel (1943) has criticized the x 2 lest in cases where the data are grouped, but 
from the point of view of the exact partition of x 2 given above his argument appears to have 
no force. 

Example 3 

Consider the following fourfold table (Roberts, Dawson & Madden, 1939, last line, p, 60): 



P 

P 


B 

942 

900 

1842 

b 

956 

936 

1892 


1898 

1836 

3734 


By the use of an expectation in each class of 933-5, we find x 2 *= 1-82860, The partition is: 



Exact partition 
using theoretical ratio 

Using 

observed data 

Rows 

0-66952 

0 

Columns 

1-02946 

0 

Interaction 

0-12962 

0-13965 

Total 

1-82860 (3 D.I-.) 

0-13965 (1 d.f.) 






H. 0. Lancaster 


129 


The same results would have been obtained had we made the transformation of the 
standardized variables by means of 



i r 


*i r 


ft ft 

§11 §12 

ft ~ft 


i i 


1 1 


"ft ft_ 

§21 §22 

V2 ft 

or using the from data 




r va. m 

§11 §12 

'ft.i ~ft, 


L- 

ftz. ftiJ 

-§21 §22. 

m ft.i ft.i 


I have to thank Prof. A. Bradford Hill, Department of Medical Statistics of the London 
School of Hygiene, for the facilities of his department. It is also a pleasure to record the 
assistance given by Dr J. 0. Irwin and Mr P. Armitage of the same department in helping 
to clear up doubtful points. The work was completed while the author was a Rockefeller 
Mow in Medicine. 


REFERENCES 

Fisher, R. A. (1922), On the interpretation of x* from contingency tables and the calculation of P. 
J, R. Statist. Soc. 85, 87. 

Fisher, R. A. (1924). The conditions under which x 1 measures the discrepancy between observation 
and hypothesis, J. R. Statist. Soc, 87, 442, 

Fisher, R. A. (1944). Statistical Methods for Research Workers, 9th ed. revised. Edinburgh: Oliver 
and Boyd Ltd. 

Cumber, E. J. (1943). On the reliability of the classical ^-test. Ann. Math. Statist. 14, 253. 

Irwin, J. 0. (1949). A note on the subdivision of x i into components. Biometrika, 36, 130, 

Nevman, J. & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes 
of statistical inference. Biometrika, 20 A, 175, 263, 

Pearson, K. (1900). On a criterion that a given system of deviations from the probable in the caso of 
a correlated system of variables is such that it can be reasonably supposed to have arisen in 
random sampling, PHI Mag, (5), 50,157. 

Boberts, E., Dawson, W. M. & Madden, Margeret (1939). Observed and theoretical ratios in 
Mendelian inheritance. Biometrika, 31, 56. 

Yule, G, U. (1922). On the application of the x 1 method to association and contingency tables, with 
experimental illustrations. J, R. Statist. Soo. 85, 95. 



[ 130 ] 


A NOTE ON THE SUBDIVISION OF f INTO COMPONENTS 

By J. 0. IRWIN 


1. Some years ago (Irwin, 1942) I derived the y 2 distribution for the weighted sum of 
squares of deviations from their weighted mean of n normally distributed variables, the 
weights being inversely proportional to the variances. 

By taking n linear functions of the original variables in which the coefficients are given by 
a matrix of Helmert’s type, Zw(x--z) 2 was partitioned exactly into the squares of (n-l) 
independent transformed variables each with unit variance. The result was 


n 

2 Wj(Xj-x) 2 — 


n— 1 


S 

i=l 


i 

i 

2" 

«k+i S w j 
i=* 1 

£ W.X, 


t+1 

i X i.+ 1 


E w } 

S w i 


L j =1 

i=l 

- 


( 1 ) 


If the original variables are normally distributed the y 2 distribution follows immediately. 

If a 1 ,a i , ...,a n are n observed frequencies with expectation m v m 2 ,.... m n , the familiar 
expression for y 4 , 2(« - m) 2 jm may be analogously partitioned. 

Let rrij = ap p where a is the total frequency expected, which is the same as the total 
observed frequency, if sample size is held constant, and consider the standardized variables 


«-*■*.»>• 

Transform these by means of the matrix [L] = [i iy ], where 
hi = 'J(Pj) (3= 1,2.»), 

l H = J\ptPjj ( (i = 2,3, j = l,2,..„t —1), 


h, = - 


i ~1 

s 

( 


\p'l S i 

1 / (=1 


Pi U -»). 


ki = 9 (i = 2,3 ,j = i + l,i + 2,...,»), 


(2) 


(3) 


It is easy to see that the matrix is orthogonal. 

The frequencies a £ (i = 1,2, —, n) are samples from a multinomial distribution. It is well 
known that this multinomial distribution can be regarded as a subdistribution of the M-fold 
distribution formed by n independent Poisson variates, when we introduce the restriction 
that the sample size a is to be held constant. 

Hence the variables in (2) can be defined to be independent standardized Poisson 
variates, and the restriction that a is to be kept constant can be dealt with later. 

The variates Z{ are independent, have zero means and unit variances, and are therefore 
unoorrelated. Since L is orthogonal the variates 


n 

*i = 2 kiXj (j = 1,2, 

3-1 



J. 0 . Irwin 


131 


hare also zero means and unit variances and are uncorrelated. Hence 

« fa-apt)* ' 

3=1 a Pj 


But 


i= i 


«1 = S 

3 = 1 


n 

s 

^=i 


s«? = s = s 


gj - affj 




1 n 

= 0 if 2 = a > 

I 1=1 


fcliat is, if the sample size a is kept oonstanb. 


Hence 

or after a little reduction 


S 


1=1 


or 

n 

s 

i=l 


n n 

S «3 = S 

1=2 3=1 





/ d± + d% + .. • “h dj 

«3.i\ 2 | 

( a i ~ aPjY 

1»- 
= -2 

1 

\2h+2>2+ ■■•+Pi 

Pj+ 1/ 

a Pj 

I 

1 

1 

1 




Pi+Pa+ — +Pj 

Pj+i ■ 


i 

[/ 

d^ d% dj 

%+i\ 2 ] 

(a.j — apf) 2 

£ **•> 
II 

i 

m x + m 2 + ... + m,j 

«b+i/ 

aPl 


1 

1 


i 


m 1 + »n 2 +... +m j 

m j +1 - 


(4) 


(S) 


The explicit form of the matrix [L'] for four variables is shown in Table 1, where y 0 = u v 
Xl = «„ etc. The transposition is purely for convenience in printing. 

Table 1. Explicit form, of the, matrix 1! for four variables. 

Standardized variables 



Ao 

Xi 

X 2 

As 

Xx-O/p 1 

■Jpi 

1 P2P1 

j PaPi 

/ P 4 P 1 

•Aopi) 

V pi(Pi+P2) 

V (Pi+Pi) {P1+P2+P3) 

-V (7->i+Pa+7b) (Pi+Pa+Ps+Tb) 


•JPi 

_ / Pi 

j PsPa 

/ biPa 

•Jim) 

V (bi+Pa) 

■V (Pi+Pi) (P1+P1+P1) 

-V (Pi + Pa+Pa) (Pi+7 3 a+7>s+7>4) 

x,-ap 3 

V?3 

0 

. / Pi+Pa 

/ P 4 P 3 

VWa) 

V bi+Ib+Kd 

V (Pi+Pa+Pa) (iPl +7 J a+P3 + P 4 ) 

*i -m 

•Jim) 

•Jpt 

0 

0 

j Pi+Vi+Pt 
■V Pi -l-Pa +713 + 7)4 


2. Now it is known that if X — [xy] (i = 1,2 = 1,2, ...,s) is the matrix of variables 

in a two-way (r x s) analysis of variance, and if the sum of squares for rows can be split up 
into the squares of (r — 1) linear components each corresponding to one degree of freedom 
and having a matrix of coefficients L 1 = [Hy], L 2 = [Ay] being the similar matrix for the 
columns’ sum of squares, then the matrix corresponding to the linear functions whose 
squares when summed form the interaction sum of squares is Q = [L 1 XL' i ). Here Lj is a 
(r- hr) matrix, X is (r x s) and L 2 is (s, s— 1), so that Q is (r — 1, s — 1), as it should be. 


For example, if 



9 - 3 ! 



132 


Subdivision of y 2 into components 


and £ is a (3 x 3 ) matrix, Q is a (2 x 2 ) matrix of linear functions of the, nine a;’s whose 
coefficients aie 


ill All 

inAia 

ill Al3 

iiiA 21 

illA 2 2 

illA 2 3 

il 2 A.i 1 

^12 Al 2 

il 2 Ais 

il 2 A 2 1 

il 2 A 22 

il 2 A 2 3 

^13 All 

^13 Aj 2 

43^13 

iiaA 2 i 

^3^22 

hi ^-23 

^21 An 

i'll A 12 

h\ Ais 

i 2 iA 2i 


hx^ii 

hi Ajl 

hi^-li 

^22 Al3 

hi A 2 i 

hi A -22 

hi A 2 3 

hi All 

hi A 12 

hi A 13 

hi^-u 

i 22 A 22 

^23 A 23 


By choosing L v L 2 to be matrices of the type given by (3) and X to be the matrix whose 
elements are given in (7) below, it follows that the interaction sum of squares is 

r— 1 s-1 

E S <•» ( 6 ) 

i-i 1=1 

where u ij = Z[A^]} 

and (l ik ) is a row vector and [AL] a column vector, or 

and this can be shown to reduc^i to 




(6 bis) 


Now let us consider an (r x s) contingency table where the observed and true expected fre¬ 
quencies are respectively 

ap i} (i = l,2,...,r;j = 1,2 . s), 


a , 




and choose standardized variables 


X ij ~ 


a ij ~ a Pjj 

jiapi]) ' 


Let 


Xj - 


h-Wi. 


' \f(ap. } ) 


(7) 

( 8 ) 


be standardized variables corresponding to the two sets of marginal frequencies. Then since 
5 = 0, the interaction sum of squares is given by 


i=U=l i=l 1=1 


and 


S x\. = £ = X 2 for 


1=1 


i=l 


rows, 


£ 4 = i = y 2 for columns. 

a P.j 


l=i 


l-i 


(9) 


( 10 ) 


If we substitute for respectively, the estimates a^Ja, a^ja, then the sums of squares 
in ( 10 ) vanish and (9) gives the usual y 2 expression for the contingency table. But the expres¬ 
sions obtained in ( 6 ) and (9) are equal, hence y 2 can be partitioned as in ( 6 ). 



J. 0. Irwin 


133 


For example, for a (3 x 3) table the four sets of coefficients are obtained by taking for L x , 
jj on p. 131 the matrices L(p x _, p 2 .> p 3 ), P. 2 > P.a), where 


L{p v Pi, Pa) 


J'. 


P 2 P 1 


J 


Pl(Pl+Pt&) 
PaPi _ 


{Pl+P*)(Pi+Pz+Pa) 



Pi 

(Px+Pa) 


J 


PaPi _ 

iPi+Pi){Pi+Pa + Pa) 


0 



Pi+Pa 
Pi+Pa + Pal 


If in association with the first set we write for the expected frequencies 


. _ gpi.p.i 

** (P.i+P.i)iPi.+Pa) 

e il + e i2 = e i.< 
e lj + e 2j ~ e .J< 

E = Se {J , 



on applying this set of coefficients to the standardized frequencies we easily reach 

xjE 


Xi = X = i, 


V( e .l e l. e . 2 e 2 .) 


{%! e 22 + ®22 e ll ~ &12 e 21 — a 21 e is) l 


( 11 ) 


( 12 ) 


which reduces to the usual expression for the fourfold table if the expected frequencies are 
calculated from the margins of the fourfold table itself and not as here from the margins of 
the original table. 

When the second set of coefficients are applied to the standardized frequencies, it is easy 
to verify that the result is the same as performing the following process: 

Form the fourfold table 


a ll 

j a 12 

+ tt 21 

+ $22 

a 31 

1 *32 


and note that the marginal and total expected frequencies are 



E{p l +p 2 .)l(Px+Pi,+Pa.) 

Ep 3 J{pi.+p 2 +Pz.) 

{Ep.i/(p.i+p. 2 )} {Ep.aKp.i+p.i)} 

E = a(p.i+2>. a )(Pi.+P 2 .+P3.) 


Then calculate x( = Xz) f rom the formula (12). 

The process is general; in particular, in this example y 3 is similarly obtained from the table 


a ll + a 12 

°13 

®21 ®22 

| ®23 

°U + «12 

®13 

+ ®21 + «22 

+ a 23 

®31 ^32 

a 33 


and Xi from 



,2; 


134 

More generally for a (m) contingency table, if 

svfc llx k ' 


i*l ' ' H 

the component fourfold tables are given by 


lH 




~ lt^i •" (^‘“l) 1 


V _ V W * 7 n 

V" h' n ■) 

Lw ^U+w 


ill be equal to the f of 


If the f from each of these tables be obtained from (12) ^ ^ ^ 
the whole contingency table, Dr Lancaster has given a formal proo 0 ^ y m ^ 

If any two orthogonal matrices of coefficients of <* 8 f an * . ^ fl. equeri oiea and 

'ft,, fa (i=1,2, j=1,2.«-1)» 0 6 con tribution8from (t-1) 

,en combined as indicated in this note, a partition of l ® t01 e f ™Kdivision of 

interest in particular cases, and the subject merits further exploration. 


REFERENCE 


Irwin, J, 0, (1942), On the distribution of a weighted estimate of variance 
in certain cases of unequal weighting, J, S, Statist. Soc. 105, 


aI1 d on analysis of variance 



[ 135 ] 


the first and second moments of some probability 

DISTRIBUTIONS ARISING FROM POINTS ON A LATTICE 
AND THEIR APPLICATION 

By P. V. KRISHNA IYER 

Department of the Design and Analysis of Scientific Experiment, University of Oxford 

1. Introduction 

A lattice of points is defined to be a rectangular array of points in any number of dimensions. 
If each point is assigned at random to one of k colours, discussion of the numbers of joins 
between points of the same colour, or between points of different colours, will involve many 
interesting and important probability distributions. The author (1949a) has examined, for 
both free and non-free sampling, distributions of the number of joins parallel to the axes of 
the lattice. In free sampling, the chance, p r , of a point taking the rth colour is fixed, and, 

k 

subject to the condition SPr — L is independent of the colour of the other points. In non- 

1 k 

free sampling, fixed numbers of points, say w 1( » 2 , ...,n kl such that Yi n r = total number of 

i 

points in the lattice, belong respectively to the colours black, white, red, etc. 

It has been shown in another communication (19496) that the results developed earlier 
(1949 a) can be used to determine whether a given distribution of diseased plants in a rect¬ 
angular plantation can be considered to be random or not. This is done by comparing either 
(1) the number of joins between two adjacent diseased plants, or (2) the number of joins 
between healthy and diseased plants adjacent to each other, with its expectation for the 
observed numbers of diseased and healthy plants in the field. The procedure there recom¬ 
mended does not take the diagonal joins into consideration. Since the spread of disease is 
not restricted to the directions of the axes of the lattice, it is very desirable to have a test 
based on all the possible joins between adjacent points in the lattice. 

Todd(1940) suggested that the number of diseased ‘doublets’, ‘triplets’, or ‘quadruplets’ 
(that is to say sets of 2, 3 and 4 adjacent diseased plants) might be used as a basis of test of 
randomness. He included diagonal adjacency, and proposed to test the significance of the 
observed number from its expectation by assuming the distribution to be binomial. Finney 
(1947) pointed out that the true variances of these distributions might be much greater than 
for binomial distribution, because of the non-independence of the individual doublets, 
triplets and quadruplets. For the last two he produced numerical evidence that Todd’s 
procedure seriously underestimated the variance, so that the test in the form proposed by 
Todd would often give spurious indications of non-randomness. 

In the nomenclature of the present paper, the number of doublets is the number of joins 
between adjacent black points, and the mean and the variance of the distribution are given 
by equations ( 2 * 1 * 5 ) and ( 2 * 1 - 6 ) below. Rather surprisingly, the true variance is less than 
that calculated from the binomial assumption. This can be seen by examining the difference 
between the formulae for the true and binomial variances. This is also true for joins parallel 
to the axes. 



136 


Probability distributions arising from points on a lattice 

The table below compares the true and the binomial variances with those obtained in 
Finney’s sampling experiment for different numbers of diseased plants. 


Variances for the distribution of doublets in a 10 x 10 lattice 


No. of 

diseased plants 

Variance 

True 

Binomial 

Finney’s sampling 

2 

0-064 

0-064 

0-08 

6 

0-620 

0-643 

0-88 

10 

2-049 

2-894 

1-81 

16 

6-697 

6-763 

4-90 

20 

9-698 

12-220 

10-67 

60 

36-141 

78-789 

— 


The present paper gives the first and the second moments for the probability distributions 
of the number of joins, taken in all possible ways, between adjacent points which are of 
(1) the same colour, (2) two specified colours, and (3) different colours, for free and non-free 
sampling, in two- and n-dimensional lattices consisting of mxn and l t x x ... x l n points 
respectively belonging to k colours. The results for free sampling in the case of two- and three- 
dimensional lattices have been given in Nature (Krishna Iyer, 1948). No attempt has been 
made here to discuss the higher moments and cumulants. 

It has been established (1949 a) that the cumulants of the different distributions for the 
number of joins, taken along the axes of the lattice, are linear expressions in the number of 
points on the sides and in the lattice. That is, for an mxn rectangular lattice, the k’s are 
linear expressions in ran, m and n; for an l x x l 2 x ... x l n w-dimensional lattice they involve 
a n , a n _ v ..., a 0 in the first degree, where the a’s are symmetric functions in Va as defined by 
MacMahon (1916). This is also true for the distributions considered in the present paper. 
Hence, for all the distributions, the y’s tend to zero when l v Z 2 ,..., l n approach infinity. That 
is, the distributions tend to the normal form when l v l 2 ,..,, l n tend to infinity. 

The test proposed by Todd may now be modified, however, so as to use the variance formula 
of the present paper (equation (2-1-6)), and will then provide a sound test of the deviations 
from randomness of disease incidence amongst the plants of a rectangular plantation. This 
is illustrated in § 4. 

2. Two-dimensional lattice 

2-1. First and second moments for the distribution of black-black 
joins for two or more colours 

It has already been shown (1949a) that the distribution of black-black joins remains the 
same whatever be the number of colours in the lattice. 

(a) Free sampling 

It has been shown (1949 a) that the rth factorial moment is r ! times the sum of the expecta¬ 
tions of the different ways of obtaining r black-black joins in the lattice. 

Let an mxn lattice consist of mn points of two colours, say black and white, with pro¬ 
babilities p and q = l—p respectively. The expected number of black-black joins is given by 

fi'i = A'sP*, (2'H) 




P. V. Krishna Iyer 


137 


where A 2 is the number of joins in the lattice. By considering the different ways in which the 
mn points of the lattice can be associated with the points surrounding them (dealing separ¬ 
ately with the four corner points, the 2(m+n— 4) border points and the (rn—2) (n—2) 
interior points), we see that 2A 2 can be expressed as 

3 ^i • 4 + ■ 2 (m + » - 4) + 8 Ci. (m - 2) (n - 2), 

so that ^ = 46-3a+ 2, (2-1-2) 

where b = mn, a = m+n. 

Two black-black joins can be obtained from (1) three adjacent black points, (2) four black 
points divided into two pairs. In (1) two black-black joins can be formed in B' 2 ways, where 

B 2 = 3 C 2 A + S C 2 . 2(m + a - 4) + gGj. (m - 2) (w-2) 

= 4(76— 9a 4-11). 

The total number of ways of obtaining two joins is \A' 2 (A 2 -1), and so the number in (2) 
is ^{A 2 (A' 2 - 1)- %B 2 }. Since the chances of having three and four black points are p 3 and 
p 1 respectively, we have 

Pm = 2B' 2 p 3 +{A' 2 (A' 2 -l)-2B' 2 }p\ (2-1-3) 

Adding and subtracting p' 3 we get 

p 2 = A' 2 p* + 2B' 2 p 3 -{A' 2 + 2B' 2 )p\ (2-1-4) 

(6) Non-free sampling 

As explained in the previous paper (1949a), the moments about the origin can be written 
down from the corresponding results for free sampling by substituting n ( {^b^ for p r , where 
Ttj is the number of black points in the lattice and n (r > is written for n(n- 1)... (n—r + l). 
This gives 

4„n,) = A' 2 nfW\ (2-1-6) 

P M = A' 2 n^+2B' 2 nfW ) +{A' 2 (A' 2 - l)-2 B' 2 }n^-{A' 2 n^} 3 , (2-1-6) 

where «,) denotes the rth moment about the mean for non-free sampling with black 
and n 2 white points. 

2-2. First and second moments for the distribution of black-white 
joins for two or more colours 

(a) Free sampling 

The chanoe of getting a black-white join is 2 p x p 2 , where p x and p 2 are the probabilities of 
the points being black and white respectively. The number of joins in an m x n lattice being 
A' 2 (see §2-1), the expected number of hlaok-white joins is 

P’i = 2A' 2 p 1 p 2 . (2-2-1) 

Two black-white joins can be obtained (1) from three adjacent points (two black, one 
white or one black, two white), (2) from two pairs of points (with one black, one white in 
each pair). The expectations for (1) and (2) are 

B'iPtPiiPi+Pi) 

“d 2{A' 2 (A' 2 -\)-2B' 2 }p\p\, 

respectively. Therefore 

= 2B iPiPiiPi+Pt) + MA' 2 (A' 2 - l)-2 B' 2 }pIpI 


(2-2-2) 



138 Probability distributions arising from joints on a lattice 

and ja 2 = 2A' 2 p l p 2 +2B' 2 p 1 p 2 (p 1 +p 2 )~4{A' 2 +2B' 2 )p\pl ( 2 - 2 . 3 ) 

which reduces to Pz = 2(A 2 + B 2 )p 1 p 2 -4(A 2 + 2B 2 )plpl ( 2 - 2 - 4 ) 

when there are only two colours in the lattice (since then p 1 + p 2 = 1 ). 

(b) Non-free sampling 

The moments about the origin can be got (1949 a) by substituting for p[p «i n 

the corresponding moments in (a). This gives 

/4<«„n,> = 24> 1 w a /6® ) (2-2-5) 

Hm. »,) = 2A' i n 1 n i lbW+2B' 2 n 1 n 2 (n 1 +n i -2)lb® 

+ 4 {4'(4'- 1 ) - 25'} wfM 2) /# 4) - 4{4£n 1 n a /6®}*. (2-2-0) 

For two colours (n x + n 2 = b) 

= [2(4 + 2Ji)/6» - MA' 2 (A ' 2 - 1) - 25'} (6 - 1)/6W] 

- 4[(4' + 25')/&<*>- 24' 2 (26 - 3)/(W>)] n 2 %£, (2-2-7) 

2-3. First and second moments for the distribution of the total number of joins 
between points of different colours for three colours 

(a) Free sampling 

It can easily be seen that fi[ = 2A' 2 1,p T p s . ( 2 - 3 - 1 ) 

The coefficients of Zp r p a and 'Ap\p\ (1 < r < s < 3) in the second moment are respectively the 
same as those of p x p 2 and p\p\ in // 2 for two colours. The term in p x p 2 p 2 arises from two joins 
formed out of ( 1 ) three adjacent points of three colours, and ( 2 ) four points of three colours 
divided into two groups, each having two adjacent points of different colours. The con¬ 
tributions of ( 1 ) and ( 2 ) above in the second factorial moment are 

6 B' 2 p x p 2 p 3 and 8{4£(4 2 -1) - 2B' 2 }p l p 2 p. i , 
respectively. Therefore the coefficient of p x p 2 p 3 in p{ 2 \ is 

[ 6 £' + 8 (^+ 0 ')], ( 2 - 3 - 2 ) 
where 0 2 = -(4£ + 25 2 ). Subtracting now the coefficient ofp x p 2 p a in (p'ff, i.e. 44 2 2 (Ep f p s ) 2 , 
from (2-3-2) we get 2(40 2 + 35 2 ). 

Thus for three colours 

N = 2(A' 2 + B' 2 )'Lp r p s -2{4:A' 2 + 5B 2 )p 1 p 2 p 3 -4(A 2 + 2B' 2 )^pfpl (2-3-3) 

(b) Non-free sampling p' 1{nM) = 2 A' 2 Xn r nJb® (2-3-4) 

/%»,«,»,) = P'Bn r n g +Qn 1 n 2 n 3 + Bi:nfnl (2-3-5) 

where P and B are the same as the coefficients of n^ and n\n\ in for two colours 

given in (2-2-7) and 

-Q = 2{4A l 2 + 5B' 2 )lb^-12{A , 2 (A' 2 -l)-2B , 2 }lbW-8A'fl{lP\b-l)}. 

2-4. First and second moments for the distribution of the total number of joins between 
points of different colours for four or more colours 

(a) Free sampling 

From the discussions in the previous paper (1949a) it can be seen that 
K = 2^Sp,.p s , (2-4-1) 

H = 2(4' + B' 2 ) Zp r p,- 2(44' + 5 B' 2 ) S p t p s p t - 4(4' + 215') [Sp*pf - 2 Zp r p 8 p t p u ]. ( 2 - 4 - 2 ) 



P. V. Krishna Iyer 


139 


(b) Non-free sampling 

/b(n 1 , 7 i a ,n a ,nj) = 24 2 Ym r Tigjb^ 1 , (2 , 4 , 3) 

Pitn 1 ,n t ,n»,n t ) = P^n r n 3 + QZn r n s n t +EEm*n* + SI, n r n a n,n w (2-4-4) 

where P, Q and B are the same as given in § 2-3 and 8 = - 2 R. 


3. K-DIMENSIONAL LATTICE 

The discussions in § 2 show that the first and the second moments for all the distributions 
can be written down if the following quantities for the x l 2 x ... x l n lattice are known: 
(1)4^, the number of joins, and (2) B' n , the number of ways of forming two joins from three 
adjacent points. 

Now, A‘ n for an x l 2 x ... x l n lattice can be calculated by extending the arguments given 
for the two-dimensional lattice leading to (2-1-2) and (2-1-3). 

< = H( 3 *-!)< + 2 ( 2 - 3n_1 -!)<-! 

+ 22(22.3“- 2 -1) <_ a + ... + 2«(2» -1) <}, 

B' n = M( 3 “ -1) (3 n - 2K + 2(2.3 n_1 -1) (2.3»-i - 2) 

+ 2 2 (2 2 .3»-2 -1) (2 2 . 3«-2 _ 2) a ;_ 2 +... + 2 n (2 n -1) (2» - 2) a'}, 


where the a”s* are symmetric functions in (lj-2), (l 2 - 2), ..., [l n ~ 2) as defined by Mac- 
Mahon (1915). 

In terms of the a’a in l v l 2 , .... l n , A' n and B' n reduce to 


K = 3»-l)a„-2.3»-K-i 

+ 2 2 .3 n-2 a„_ a +... - 1 ) n . 2 n a 0 }, 

B'n = i{(3 n - 1) (3™ - 2) a n - 2(3 2 *-2.5-3™) a„_ x 

+ 2 2 (3 s "-4.5 2 - 3*- 1 ) a n _ 2 +... + (-1)» 2 n (5 n - 3) a 0 }. 

We may now give the first and the second moments for the various distributions dealt 
with in § 2. 


3-1. Black-black joins for two or more colours 
(a) Free sampling ^ = 

P 2 = A' n p\ + 2B' n pl-(A' n +2B' n )p\. 


(3-1-1) 

(3-1-2) 


(b) Non-free sampling 

p' M = A' n nf^\ (3-1-3) 

P M = A' n nfla^ + 2B'nuffa^+{A' n (A' n - 1)-2B;}< ) R 4) -{4> ( 1 2 V^ ) } 2 - (3-1-4) 


3-2. Black-white joins for k colours 

(a) Free sampling 

Pi = 2 A'nptfi, (3-2-1) 

Pi = 2A' n p 1 p 2 + 2B' n p 1 p 2 (p 1 +p 2 )-l(A' n + 2B' n )plpl (3-2-2) 


(b) Non-free sampling 

P'i(n „nj 24 n 2 ja%\ (3-2-3) 

Px ni .n t ) = 24> 1 n a /a® + 2£> 1 n a (n 1 + n a - 2)/a®> 

+ 4 {A' n (A' n -1) - 2 B' n ) nfnfjaf - 4 {A^n^. (3-2-4) 


* a' r = S&-2) (^-2)^-2)... (Z r -2), aj = 1. 



140 


Probability distributions arising from points on a lattice 

3-3. Total number of joins between points of different colours for h colours 

(a) Free sampling 

= 2 A' n Zp r p 3> (3-3-1) 

fi 2 = 2(4 + B’ n ) Zp t p s - 2(4 A‘ n + 5B' n ) ZprPsP,- 4<4 + 2 B' n ) (2 p % r p%- 22p r p s p t p u ). (3-3-2) 

(b) Non-free sampling 

f J 'l(n h n i , njt) = 24 £w r «. s /(X®, (3-3-3) 

.+ 1)-2B;} (a,- l)K>]S» r « 3 

- [2(44 + ZB’JlaW - 12 { 4(4 - 1) ■- 2 4}R 4) - 8 4 a /KK - 1)}] S W( 

- 4((4 + 24)/a«J> - 2-4 2 (2a n - 3)/(aM>)] [Lnf »| - 22n r n a n t n u ]. (3-3-4) 

4. Application 

The distributions discussed in the present paper suggest that a test of significance for the 
random distribution of diseased plants in a rectangular plantation can be made by finding 
the standardized deviate for (1) the number of joins between adjacent healthy and diseased 
plants, and (2) the number of joins between adjacent diseased plants. (2) corresponds to 




X 

• 

• 



• 

• 

X 

■ 

• 


• 



• 


• 

• 

• 


X 

• 

X 

• 

■ 

X 


• 

■ 


• 




• 

X 

• 

• 

• 


• 

■ 

• 

■ 

• 


• 

• 


■ 


• 

■ 

X 


X 

• 

• 

• 

X 


• 

• 

• 


• 

• 


X 


• 

• 

• 



• 

• 

X 

• 


• 

X 

• 


X 

• 


X 


• 

X 

• 


• 

• 

■ 

■ 

• 



. X 

X 


• 

• 


• 


• 

X 

X 


• 

X 

• 

• 

X 


• 

X 

X 


• 

• 


X 




• 



• 

X 

X 

• 


X 

• 

• 


• 

X 


• 




• 


• 

• 


X 

X 


• 

. X 

X 


X 

X 


• 


X 





• 


X 

• 


X 


• 


X 

X 


X 




X 

X 

X 

X 

• 

X 

■ 

X 

• 

• 

X 


• 




X 

X 


X 



• 

• 


X 



X 

• 


• 











• 

• 

X 


X 

• 

• 


X 

• 


• 


X 


• 



• 


• 

• 


• 

• • 

X 

• 

• 











X 

. 


X 


. 

, 

. 

a 


X 








Pig. 1 


Todd’s method with the modification proposed in the introduction. The two methods are 
illustrated below for a field consisting of 20 x 15 plants, 60 of which are diseased and dis¬ 
tributed in the manner shown in Fig. 1. 

The numbers of diseased-healthy and diseased-diseased joins in the above configuration 
are 365 and 42 respectively. Their expected values and variances obtained by using the 























P. V. Krishna Iyer 


141 


expressions (2*2*5), (2*2*6), (2*1*5) and (2*1*6), are 352-21 and 139-56, and 43-29 and 30-58 
respectively. Thus the standardized deviates for the diseased-healthy and diseased-diseased 
joins are 

j 365-352-21 . A0 , 42-43-29 

= 1*08 and —— = -0-23, 


^139-56 


V30-68 


respectively. This shows that the distribution of the diseased plants can be taken to be 
random. 

5, Summary 


The paper gives the first and the second moments for the distribution of the number of joins 
between adjacent points of (1) the same colour, (2) two different specified colours, and 
(3) different colours, for m x n and { x l 2 x... x l n lattices of points of k colours, for free and 
non-free sampling. The joins considered include all the possible ones between adjacent points, 

including the diagonals. All the distributions tend to the normal form when l v l 2 . l n tend 

to infinity. These results have been applied for testing the departure from randomness of 
a given distribution of diseased plants in a rectangular plantation. 


The author’s thanks are due to Dr D, J. Finney and Dr John Wishart for going through 
the manusoript and for making very useful suggestions for improving the paper. 


REFERENCES 

Finney, D. J. (1947), The significance of associations in square-point lattice. J. B. Statist. Soc. Suppl. 
9,99. 

Krishna Iyer, P. V. (1948). Random association of pornts on a lattice. Nature, Lond 162,333. 
Krishna Iyeb, P. V. (1949a). The theory of probability distributions of points on a lattice. Ann, 
Math. Statist, (in press). 

Krishna Iyer, P. V. (19496). Random association of points on a lattice. J. Indian Soc. Agric, 
Statist, (in press), 

MaoMaion, P, A. (1915). Combinatory analysis, 1. Cambridge University Press, 

Todd, H. (1940). Note on random associations in a square-point lattice. J, B, Statist, Soc. Suppl, 



[ 142 ] 


PROBABILITY TABLES FOR THE RANGE 
By E. J. GUMBEL, New York 

In the following, the asymptotic distributions of the range and the midrange for any un¬ 
limited symmetrical distribution of the exponential type will be compared with Elfving’s 
distribution of the probability transformation of the normal range. The asymptotic pro¬ 
bability and distribution of the reduced range are given to five significant decimal places, 
the reduced range being a linear function of the range proper. 

1. Elfving’s distribution 

Elfving (1947) has given the following approach to the distribution of the range: Let z. 
and x n be the smallest and the largest among n observations taken from a known sym¬ 
metrical distribution and let be the probability of a value equal to or less than x. 


The introduction of two new variates £ and ?/ defined by 

£ = 2»Vt®fa)(l-®(*J)], V = ilg[4>(^)/(l-0>K))] (1) 

leads to the joint asymptotic distribution where 

fS,V) = tfe-* C0Sh '- ( 2 ) 

Integration over r/ yields the distribution of £ 

/(£) = tm), (3) 

where K 0 is a Bessel function in the designation of the British Association Mathematical 
Tables (1937). The author shows for the normal distribution that £ converges in probability to 

£ = 2n(l—<£(£«;)), (4) 

where w = x n - aq (6) 


is the range. Thus formula (3) is the distribution of the probability integral transformation 
of the normal range. The tables for the distribution/(£) and the probability F(£) may be used 
to check an observed distribution of ranges provided that the analytical form of the initial 
distribution its parameters and the sample size n are known. 

2. The asymptotic distribution of tiie range 

Instead of the probability integral transformation £ we now consider the range w proper. 
Let (l>(x) be any symmetrical unlimited distribution of the exponential type, let u be the 
expected largest value defined by 

<h(w) = 1 - 1/n, (6) 

and let a coefficient a, be defined by 

a = n<fi(u) ; (7) 

finally let the quantity R defined by the linear transformation 

R = a(w - 2u) (8) 

be termed the reduced range. If the sample size n is sufiiciently large, the extremes x 1 and x n 
are independent (Gumbel, 1946), and the distribution 

ir(R) = X Y'(R) 


(9) 



E. J. Gumbel 


143 


of the reduced range is obtained (Gumbel, 1947 a, b) from the convolution of the limiting 
distributions of the largest and of the smallest values given by Fisher & Tippett (1928) as 

being /■+» 

iJr(R) = e~ R \ exp \—e v — e- v ~ R ]dy. (10) 

J —co 

This distribution is subject to Bessel’s equation 

jJf"(R) + f\R)-e~ R f(R) = 0, (11) 

which leads to ^(R) = 2e _ - B f£ 0 (2e _iJi ). (12) 

This distribution is clearly asymptotic since it does not contain the sample size n, and it is 
distribution free, except for the conditions imposed on <t>(x), since any trace of the initial 
distribution $(oc) has disappeared. The distribution f%(w) of the range w itself is obtained by 
the usual procedure, as 

/a(«0 = af(R). (13) 

The two parameters a and u in (8) and (13) depend upon the initial distribution and the 
sample size. Since the mean reduced range R is the range of the reduced means, and since the 
variance of the reduced range is the sum of the variances of the reduced extremes, we have 

R = 2y, o% = £tt 2 . (14) 

Thus the two parameters a and u may be estimated from the observed mean range and the 
observed standard deviation of the range with the aid of (8) and (14). 

The generating function 0 R (t) of the reduced range (Gumbel, 1944) obtained from the 
generating function of the extremes is 

<?*(<) = P(l-t), (15) 

and the two betas are /3 X = 0-64928, /? 2 — 3 = 1-2, (16) 

i.e. one-half of the corresponding numbers for the largest value given by Fisher & Tippett 

(1928). These values allow us to complete the /? 1( /? 2 diagram traced by Pearson (1926), for 
normal extremes and normal ranges for 2g«gl000. Fig. 1 shows that the asymptotic 
values are situated on straight lines extrapolated from the last calculated values. The 
asymptote is more quickly reached for the range than for the extremes. 

The distribution (13) may be used to check an observed distribution of ranges, provided 
that the initial distribution is known to be symmetrical, unlimited and of the exponential 
type, and that the sample size is large enough. 

We do not need to know the sample size n. Furthermore, the knowledge of the analytical 
form of the initial distribution and its parameters is not required, since the parameters of the 
distribution of the range may be estimated from the observations themselves. These pro¬ 
perties mark the essential differences between the author’s theory and the method developed 
by Elfving. 

3. Relation between the two solutions 

According to Elfving, the distribution of the probability integral transformation of the 
range is given by equations (3), (4) and (5), whereas the author’s method leads to a dis¬ 
tribution of the range proper, given by equations (8), (12) and (13). 

The question arises of how these two . results are related. To this end we establish the 
asymptotic nature of Elfving’s variates £ and rj, for a symmetrical unlimited distribution of 



144 Probability tables for the range 

the exponential type. It has been shown (Gumbel, 1935), that under these conditions the 
probabilities Ofo) and <D(a;J of the smallest and of the largest values converge towards 

<&(*,) = - 1 - <&(K n ) = ^ e~ a<x n- u K (17) 

U 71 ’ 

Then the variate £ becomes, from ( 1 ), 

£ 2 = 4e -a t e n _;c i~ 2 “>, 

or, from (5) and ( 8 ), £ = 2e~ in , (18) 

which is an exponential function of the reduced range R. This relation is more general 
than (4). Since d£ = -e^ R dR 

Scale of/ffi 



the distribution \Jf(R) of the reduced range R becomes, from (3) and (18), 

fr(R) = 2e~l R K 0 (2e~ iR ) e~ iR , 

which is again formula ( 12 ). Thus Elfving’s approach yields the same result as the direct 

method developed simultaneously (Gumbel, 1947a). 

It is worth while to study also the second variate 77 , From ( 1 ) and (17) we obtain under 

the same conditions , , , . 

77 = ia(x 1 ~u+x n +u). 

This may be written 77 = \olv, (19) 

where 1 ;=*! + ^ ( 20 ) 






E. J. Gtjmbel 


145 


Table 1. The asymptotic probability integral, 'V(R), and the asymptotic 
distribution function, iJr{R), of the reduced range, R 


R 

Y(JR) 

f(R) 

R 

V(R) 

ir(R) 

Bl 

M-'(JS) 

f(R) 




1-5 

0-62645 

0-20346 

6-5 

0-9904510 

0-0080533 




1-0 

0-64647 

0-19693 

6-6 

0-9912243 

0-0074218 




1-7 

0-66482 

0-19016 

6-7 

0-9919389 

0-0068376 

-3-2 

0-00020396 

0-00096273 

1-8 

0-68349 

0-18321 

6-8 

0-9925933 

0-0062974 

-31 

0-00032305 

0-0014471 

1-9 

0-70146 

0-17613 

6-9 

0-9931977 

0-0067982 

— 3-0 

0-00049980 

0-0021244 

2-0 

0-71872 

0-16898 

7-0 

0-9937542 

0-0063370 

-2-9 

0-00076618 

0-0030496 

2-1 

0-73520 

0-16180 

7-1 

0-9942663 

0-0049111 

-2-8 

0-0011201 

0-0042862 

2 2 

0-75108 

0-15464 

7-2 

0-9947375 

0-0046180 

-2-7 

0-0016269 

0 0059003 

2-3 

0-76619 

0-14763 

7-3 

0-9951709 

0-0041553 

-2-6 

0-0023162 

0-0079687 

2-4 

0-78059 

0-14051 

7-4 

0-9955695 

0-0038207 

-2-5 

0-0032372 

0-010666 

2-5 

0-79429 

0-13360 

7-5 

0-9959359 

0-0035122 

-2-4 

0-0044486 

0-013707 

2-6 

0-80731 

0-12683 

7-6 

0-9962727 

0-0032278 

-2-3 

0-0060132 

0-017642 

2-7 

0-81966 

0-12023 

7-7 

0-9966822 

0-0029658 

-2-2 

0-0080016 

0-022253 

2-8 

0-83138 

0-11381 

7-8 

0-9968606 

0-0027244 

-2-1 

0-010490 

0-027049 

2-9 

0-84243 

0-10759 

7-9 

0-9971277 

0-0025021 

-2-0 

0-013669 

0-033864 

3-0 

0-86289 

0-10167 

8-0 

0-9973676 

0-0022974 

-1-9 

0-017291 

0-040916 

3-1 

0-86276 

0-065767 

8-1 

0-9976878 

0-0021091 

-1-8 

0-021769 

0-048797 

3-2 

0-87205 

0-090190 

8-2 

0-9977899 

0-0019358 

-1-7 

0-027077 

0-057483 

3-3 

0-88080 

0-084840 

8-3 

0-9979764 

0-0017704 

-1-0 

0-033291 

0-060924 

3-4 

0-88902 

0-079720 

8-4 

0-9981466 

0-0016298 

-1-6 

0-040484 

0-077049 

3-5 

0-89675 

0-074830 

8-6 

0-9983017 

0-0014960 

—1-4 

0-048721 

0-087708 

3-6 

0-903998 

0-070169 

8-0 

0-9984460 

0-0013711 

-1-3 

0-068064 

0-098971 

3-7 

0-910792 

0-065735 

8-7 

0-9085763 

0-0012573 

-1-2 

0-068627 

0-11063 

3-8 

0-917163 

0-061624 

8-8 

0-9980967 

0-0011627 

-11 

0-080168 

0-12232 

3-9 

0-923104 

0-057632 

8-9 

0-9988071 

0-0010566 

-1-0 

0-092994 

0-13419 

4-0 

0-928666 

0-053763 

9-0 

0-9989083 

0-00096837 

-0-9 

0-10700 

0-14599 

4-1 

0-933881 

0-050181 

9-1 

0-99900102 

0-00088737 

-0-8 

0-12218 

0-16768 

4-2 

0-938709 

0-046810 

9-2 

0-99908599 

0-00081302 

-0-7 

0-13861 

0-16881 

4-3 

0-943230 

0-043032 

9-3 

0-99916383 

0-00074479 

-0-6 

0-16593 

0-17956 

4-4 

0-947442 

0-040641 

9-4 

0-99923513 

0-00068218 

-0-6 

0-17440 

0-18909 

4-5 

0-951364 

0-037828 

9-5 

0-99930044 

0-00002474 

-0-4 

0-19384 

0-19909 

4-6 

0-955013 

0-035186 

9-6 

0-99936024 

0-00057206 

-0-3 

0-21419 

0-20768 

4-7 

0-968400 

0-032708 

9-7 

0-99941499 

0-00052374 

-0-2 

0-23536 

0-21536 

4-8 

0-961560 

0-030386 

9-8 

0-99946512 

0-00047944 

-0-1 

0-25723 

0-22208 

4-9 

0-964488 

0-028211 

9-9 

0-99951100 

0-00043883 

0 

0-27973 

0-22779 

5-0 

0-967207 

0-026177 

10-0 

0-99955300 

0-00040101 

0-1 

0-30276 

0-23246 

5-1 

0-969728 

0-024277 

10-1 

0-99959143 

0-00036760 

0-2 

0-32619 

0-23608 

5-2 

0-972066 

0-022502 

10-2 

0-99902659 

0-00033624 

0-3 

0-34993 

0-23866 

5-3 

0-974233 

0-020846 

10-3 

0-09965877 

0-00030761 

0-4 

0-37389 

0-24021 

6-4 

0-976239 

0-019303 

10-4 

0-99968820 

0-00028138 

0-5 

0-39794 

0-24075 

5-5 

0-978097 

0-017865 

10-5 

0-99971512 

0-00025735 

0-6 

0-42201 

0-24034 

5-6 

0-979816 

0-016527 

10-6 

0-99973973 

0-00023535 

0-7 

0-44598 

0-23902 

5-7 

0-981405 

0-016283 




0-8 

0-46978 

0-23685 

5-8 

0-982875 

0-014126 




0-9 

0-49333 

0-23389 

5-9 

0-984233 

0-013051 




1-0 

0-61654 

0-23021 

6-0 

0-985488 

0-012053 




11 

0-63936 

0-22588 

6-1 

0-986646 

0-011127 




1-2 

0-68169 

0-22097 

6-2 

0-987715 

0-010269 




1-3 

0-68352 

0-21554 

6-3 

0-988702 

0-0094729 




1-4 

0-60479 

0-20969 

6-4 

0-989612 

0-0087358 

j 




Biometrika 36 


10 







146 Probability tables for the range 

is the mid-range. Since £ is positive, the distribution Mv) of the variate y becomes, from (2), 
if we put £ cosh rj = t 

1 f °° 

after integration over £ Mv) = 2cos h*^ J 0 u ~ tfiL 

This distribution may be written 

2 2e~ 2 i 

(e^ + e - ’) 2 (1 + e -2 ’) 2 ' 

Consequently the distribution f 4 {v) of the mid-range 

v — 2rjjoi 

CL 

* 8 M v ) ~ _|_ e -au)2> ( 21 ) 

a result which is already known (Gumbel, 1944). The asymptotic distribution of the mid¬ 
range for a symmetrical initial distribution is symmetrical but not normal. This may be of 
interest since it refutes the widespread opinion that all measures of central tendency converge 
towards normality. Tables for the distribution (21) are easily constructed. 

4. Tables eor the asymptotic range 

The numerical values of the asymptotic distribution (10) and its integral, the probability 
T(f?) of the reduced range, arc not easy to obtain. The Calculation and Ballistics Department 
of the Naval Proving Ground (Dahlgren, Va.) has calculated tables for both functions by 
stepwise integration of the differential equation (11), using the special relay calculator of 
the International Business Machine Corporation. The calculations started from boundary 
values obtained from the British Association Tables (1937). The reduced range chosen varied 
from B = - 3-22 to B — 10-60 at intervals of 0-01. The results given to 8 and 9 decimal places 
may be in error by two or three units in the last place. These figures* are given in a condensed 
form in Tables 1 and 2. It is hoped that they will be sufficient for all practical purposes. 


Table 2. Percentage levels of the reduced range 


I 

2 

3 

4 

5 

6 

Probability 

Reduced range 







Number of 

Probability that 





samples 


Small 

Large 

Negative 

Positive 

N 



T(*a) 

Bi 

R s 



■ 

0-9996 


9-875 

2000 

0-999 

■ 


-2-829 


1000 

0-998 

0-0020 

0-9980 

-2-642 

8-314 

500 

0-996 

0-0026 

0-9976 

-2-578 

8-059 

400 

0-995 


0-9950 

-2-362 

7-260 

200 

0-990 



-2-118 

6-445 

100 

0-980 



-1-837 

5-611 

50 

0-960 

1 

0-9750 

-1-737 

5-337 

40 

0-950 

0-0500 


-1-386 

4-464 

20 

0-900 

H 


-0-949 

3-544 

10 

0-800 

; : 1 



2-543 

5 

0-600 



-0-133 

2-193 

4 

0-500 


The author wishes to express his sincere appreciation for permission to reproduce these tables. 




























E. J. Gumbel 147 

Column 1 of Table 1 gives the reduced range R, columns 2 and 3 give the probability integral 
and the distribution function t/r(R), respectively. 

The percentage levels obtained from the original tables (not reproduced here) are given 
in Table 2, in which the reduced ranges R (cols. 3 and 4) are written down as functions of 
certain values of the probabilities 'F(.R), (cols. 1 and 2). Col. 5 is the inverse of col. 1. It may 
also be interpreted as the solution of 

¥(*„)-1-1/tf, (22) 

where R N , defined in analogy to (6), stands for the expected largest reduced range in N 
samples. Col. 6 is obtained from the differences of cols. 2 and 1. The reduced ranges (cols. 
3 and 4) of Table 2 are traced against the probabilities (cols. 1 and 2) and the numbers N 
(col. 6) in Fig. 2. For the sake of completeness, the curves are extended from 1 F = 0-0003 and 
1 _*F = 0-0003 up to the median by use of Table 1. 


Upper probability level 

99-97 * 99-9 99-5 99 95 90 80 50 % 



The mode R, the median and the mean R are, respectively, 

R = 0-506366440; $ = 0-928597642; R = 1-154431330. 

Conclusions 

The asymptotic distributions of the ranges and mid-ranges for an unlimited symmetrical 
initial distribution of the exponential type are obtained from the convolution of the 
asymptotic distributions of the extremes. Elfving’s approach for the normal range leads 
to the same results as this direct method which does not require the knowledge of the sample 
size, nor of the analytical form of the initial distribution, nor of the numerical values of its 
parameters. 


IO-2 




Britt Aesociation (1931), MdemU Mu, 6 , Ml Fmlim, Bart i: Functions of order» 
and unity. Cambridge Univsreity Press, 

Emm, G. (1947), The asymptotical distribution of range in samples from a normal popula(i 0ll 
Biomefriin, 34, Ill, 


fern, R, A, A tan, L, H, C. (1918), Limiting forms of lie frequency distribution of tl* tag 
or smallest member of a sample, Pm. Cam P'd See, 21,189, 



Gduh, E, J, ( 1 M). Ranges and midrange;, in M SUk 15, ili, 

Guam, E, J, (1948). On the independence of the extremes in a sample, in Mali Stalk 17 71 
Gnna, E, J, (1947a), The asymptotic distribution of the range, Bi, Am, Mali Sac, 531 ' 
Gumbei, E, J. (19476), The distribution of the range, Am M Mil, 18, 384, 

Bhmoh, E, S, (1928), A further note on the distribution of range in samples taken from a normal 



[ 149 ] 


SYSTEMS OF FREQUENCY CURVES GENERATED 
BY METHODS OF TRANSLATION 

By N, L. JOHNSON 

1. Introduction 
1-1. Preliminary remarks 

This paper is concerned with the discussion of some of the uses which may be made of trans¬ 
formations of variables such that the transformed variables may be considered to have a 
normal distribution. The concept of such transformations was put forward by Edgeworth 
(1898) and termed by him the Method of Translation. Edgeworth considered, in fact, only 
transformations which could be represented by polynomials, as did Kapteyn (1903). Later, 
however, Kapteyn & Van Uven (1916), Wicksell (1917) and Rietz (1922) extended the 
method to more general kinds of transformation. As we shall see later, the particular case of 
the logarithmic transformation, which is given some prominence in each of the last two 
references, had been anticipated by other authors who had not, however, considered the 
transformation as more than a special device applicable to particular cases. 

1*2. Historical development 

It is of interest to consider the reasons why the need for such transformations should have 
arisen.. There is no doubt that in the earlier phases of their development the primary object 
was that of graduating observed frequency distributions. The normal distribution had played 
a dominant role in both theoretical and applied statistics since the time of Laplace. It was, 
however, apparent that the normal curve could not provide an adequate representation of 
many of the distributions encountered in statistical practice. Towards the end of the nine¬ 
teenth century attempts were made to construct systems of frequency curves which should 
be capable of representing a wider variety of distributions than those for which a normal 
curve would suffice. It may be noted that the most obvious departure from normality was 
that which is described as skewness, and that much of the work at this time was described 
as the construction of systems of ‘skew frequency curves’. The most successful of the 
systems then proposed have been those of K. Pearson (1895) and Charlier (1906). The work of 
Edgeworth and others, referred to in § L1, constitutes a third line of approach which, though 
not so widely used as those of Pearson and Charlier, has certain advantages of its own. In 
view of the important position occupied by the normal curve, it was, of course, natural to 
consider the possibility of relating observed distributions to the standard form. The fact 
that functions associated with the normal curve were well tabulated must also have been 
a strong contributory factor. An important reason for the lack of general acceptance of the 
method of translation is the fact that it became apparent that compared with the Pearson 
system the curves proposed covered only a very limited variety of shapes. A similar criticism 
might, of course, be directed at the Charlier system, though the latter system possesses 
advantages in respect of the aid which its analytic form offers to theoretical investigations. 

The main purpose of this paper is to propose certain systems of curves derived by the 
method of translation, which, it is hoped, retain most of the advantages while eliminating 
some of the drawbacks of the systems first based on this method. 



150 


Frequency curves generated by methods of translation 

1-3. Transformation to normality 

Subsequently to the construction of the systems of curves described in § 1-2, the normal 
distribution has gained added importance as a result of developments in statistical theory, 
In particular, the theory of significance tests and the associated probability distributions 
have been worked out much more thoroughly for normal populations than for other cases, 
Originally this may have been a consequence of the theoretical importance of the normal 
distribution, based on Laplace’s theorem and the central limit theorems, but a factor of 
considerable importance is the simplicity of results based on normal populations. ‘Normal 
theory’, as it may be termed, is so much simpler than theory based on any general system of 
curves that it is of great importance to be able to use it if possible. To this end, two lines of 
inquiry have been put forward. E. S. Pearson (1931) and R. C. Geary (1947), inter alia, have 
considered the problem of how far normal theory may be invalidated by various kinds of 
departure from normality in the original distributions. The other approach is in effect an 
application of the method of translation. A function of the observed variable is sought which 
shall be, with sufficient approximation, a normal variable. Normal theory, with its simplicity 
and convenience, is then applied to the transformed variables. Curtiss (1943) gives a good 
critical summary of many of these methods. It may be noted that the interest, in these 
applications, lies in the significance tests to be applied and not in the creation of systems of 
frequency curves. 

A further application of the method of translation is found in the approximate normaliza¬ 
tion of certain test criteria. In this case it is implicitly assumed that the original distribution 
is normal, and the method of translation is used to simplify certain parts of ‘ normal theory’. 
Examples are the Wilson-Hilferty (1931) transformation of y 2 , and the transformations 
proposed by Hotelling & Frankel (1938) and by Cornish & Fisher (1937). 

1-4. General theoretical background 

Pretorius (1930), in the course of a long paper dealing with non-normal distributions, 
remarks: ‘The superiority of one frequency function over another depends rather on the 
success with w r hich that function can be applied to graduate data than on the manner in 
which it originated.’ This point of view has much to recommend it and, if accepted, absolves 
us from the necessity of providing a plausible probability theory basis for any proposed 
system of frequency curves. On the other hand, it must be remembered that the normal 
curve was first reached from probability theory rather than from the graduation of data, 
While, therefore, from a utilitarian point of view a probability theory basis is unnecessary, 
it is useful to keep the theory in mind when constructing new systems. For example, Pearson’s 
fundamental differential equation was based on certain considerations of probability, 
though it was applied in cases where these considerations could hardly be presumed valid. 
Rather similarly the method of translation can be related to probability theory in a general 
and somewhat tentative manner. The argument, as described below, is due to Kapteyn 
(1903) and Wieksell (1917). 

The normal distribution can be considered as arising from the summation of a large 
number of small independent effects which have occurred in a specified order. If it be now 
supposed that the magnitude of an effect be proportional to some function of the value of 
the variable before the addition of the effect, it can be shown that a certain function of the 
final variables should be normally distributed. 



N, L. Johnson 


151 


Suppose x v x 2 , to be independent random variables, each capable of taking only a small 
range of values near zero. The first sentence of the preceding paragraph can be interpreted 

as meaning that = x 1 + x 2 +... +x n (1) 

will be approximately normal if n is large. The second part of the paragraph means that if 

Y n = x x + x 2 G(Y X ) +... + a; n G(Y n _ x ), (2) 

where 0 is some function, then it is possible to determine a function f(Y n ) which is approxi¬ 
mately normally distributed. This would be the case if 

f(Y n ) = X x + X 2 +... + x n . 


Now if this be so, 
From (2), 

Hence 


f(Y n )-f(Y n _ x ) = 

Yn — lft-1 = @(Y n - r). 

Md) = 1 
Y n -Y»-i G(Y, 


(3) 

W 


Since x n is supposed small, it follows that 


/'(T)==1/G(F). (5) 

Van Uven, in an Appendix to Kapteyn & van Uven (1916), and Baker (1934) have pointed 
out that it is always possible in theory to transform any continuous distribution into a normal 
distribution. VanUven gives a graphic method of doing so, while Baker proposes an approxi¬ 
mation based on the method of moments. In both cases, however, practical difficulties are 
considerable. 

The parallelism of equation (5) and the equation for a function which shall ‘equalize 
variances ’ (when the standard deviation is proportional to the function G of the expected 
value) is notable, and, in fact, equalization of variance and approximate normalization often 
go together. Curtiss (1943) gives a full discussion of these two aspects of certain transforma¬ 
tions used in the analysis of variance. 

Recently, in connexion with problems concerning the distribution of particle sizes, 
Kolmogoroff (1941), Halmos (1944) and Epstein (1947) have developed another theoretical 
basis leading, under certain conditions, to the most common form of the distributions which 
we shall consider (see § 31 below). 


T5. Order of discussion 

In this paper we shall not consider in any further detail theoretical arguments for the use 
of transformations to normality. We shall be concerned, rather, with the study of the pro¬ 
perties of distributions for which simple transformations to normality are possible. 

In§§ 2-1-2-4 the problem will be considered in a general manner. Certain properties which 
are valid for wide classes of transformation will be described and a basis will be developed 
for the discussion of any special system. Three such special systems will be put forward, 
and their properties discussed, in §§ 3T-3-6; bivariate distributions based on these systems 
will be considered in a later paper. 


2. General 'theory 

2-1. Translation as a method of generating systems of frequency curves 
Any curve of the Pearson system of frequency curves is a solution of the differential equation 

1 dy _ a+x 
ydx c 0 +c 1 x + c 2 x 2 ’ 



152 Frequency curves generated by methods of translation 

and is defined by the values of the parameters a, c 0 , c x and c 2 in that equation. Somewhat 
similarly, a curve in the Charlier A system is defined by the values of coefficients in the -well- 
known expansion of derivatives of the normal function. 

If we write a transformation of a variable x to normality in the formal manner 

2 =/(*). 

where z is a unit normal variable, we have, clearly, defined a multiply infinite system of 
frequency curves, corresponding to the possible functions/(a') which might be chosen. In 
order to obtain a system of curves analogous to the Pearson or Charlier systems, f(x) must be 
specialized, preferably in a simple form, and made to depend on a certain number of para¬ 
meters. The values of these parameters will then determine which curve of the system 
represents the distribution of x. 

It is convenient to introduce four parameters (as in the case of the Pearson system) 

and to write lx—£\ 

2 = r + i/|__Sj. (6) 

Here / is, preferably, a function of simple form, depending on no variable parameters. 
/{(£-£)/A} should also be a monotonic function of x. Without loss of generality it will be 
supposed that/{(* —£)/A} is a non-decreasing function of x and that 8 and A are positive. 

From quite general considerations, it is possible to appreciate the roles played by certain 
of the parameters. If we write ^ 

then z^y+8f(y), (7) 

whence p{y) = 8f'{y)p(z) (^ Y+a/Cu) (8-1) 

=^-/(y)exv{-tir+my)n (8-2) 

Equation (8-1) is, of course, of general validity and does not depend on the definition of z 
as a unit normal variable. Since x = £+Ay, it follows that the distribution of x will be of the 
same shape as that of y, which is given, in general, by (8-1). The standard deviation of x will 
be A times that of y, while changes in £ will affect only the expected value (or other central 
measure) of the distribution of a;. 

It follows that the parameters y and 5 determine the shape of the distribution of x, that 
A is a scale factor and £ a location factor. It follows also that attention should be concentrated 
on the relation between the values of y and <5 and the distribution of x, since the parameters 
£ and A affect the distribution only in a simple manner. It will therefore be convenient to 
take as our standard form of transformation 

z = y±8f(y), {Ibis) 

rather than the more complicated expression (6), and to investigate the relation between 
y, 8 and the shape of the distribution of y. 

2-2. Requirements of a translation system 

The system of frequency curves obtained depends on the function f(y) which is chosen. 
For practical convenience this function should possess the following properties: 

(1) It should be a monotonic function of y. 

(2) Apart from being simple in form it should be easy to calculate. Preferably, tables of 
the function should be in existence. 



N. L. Johnson 


153 


(3) The range of values of f(y) corresponding to the actual range of possible values of y 
should be from — oo to + oo. Although good approximation may sometimes be obtained even 
when this requirement is ignored, it is highly desirable that it should be satisfied, since z, 
being a normal variable, is supposed to vary from — oo to + co. 



Median 


Fig. 1 

2-3. General properties of translation systems 

In this section we shall study the transformation z = y + Sf{y), remembering that x is 
related to y by the linear equation x — £ + Ay. We shall suppose in the first place that z is 
a standardized variable with a symmetrical distribution. The general properties of the 
relationship can be most easily appreciated with the help of Fig. 1 ,* This diagram actually 
represents the case where z is a standardized normal variable and the function / has the form 
log{ 2 //(l -y)}, but it illustrates the general properties of the transformation. 

Relatively to the base-line ABC, which is parallel to and at a distance y from the axis of y, 
the dotted curve has been plotted with ordinates f(y) and abscissae y. For the solid-line 

* I am indebted to Prof. E. S. Pearson for suggesting the use of this diagram. 






154 Frequency curves generated by methods of translation 

curve the ordinates, measured from ABC, have been multiplied by 8. As a result, when 
referred to the axis of y and z, the solid-line curve represents the functional relation between 
y and z. 

The effect of the distortion of the z-scale, due to this relationship, on the distribution of y 
(or of x) is also illustrated. The shaded columns, equal in area, under the two distribution 
curves, represent the probabilities of z and y (or x) falling in corresponding small intervals 
8z and 8y (or 8x). Clearly where/'(y) has a high value, the contraction on the y-scale due to 
the transformation is greater than where /'(y) is smaller. The values of y and 8 affect the 
distribution of y in so far as they determine over what parts of the total range of y these 
augmentations and diminutions of probability density shall occur, and to which parts of 
the distribution of z they shall correspond. 

As 8 is increased, it is seen that the range within which observations are likely to be found 
(e.g. corresponding to — 3 < z < 3) will correspond to a smaller and smaller length of the 
dotted curve representing/(y), which in the limit may be regarded as linear. Thus if y 0 be 
defined by 


0 = y + 8f{y o ), 


(9) 


it is seen that y 0 will be the median of the y distribution and, further, if 8 be sufficiently large, 
we shall have to a close degree of approximation 


z^8(y-y 0 )f'(y 0 ) 

for the bulk of the distribution of y. (10) may be written 


( 10 ) 


2/=2/o+W'(3/o)- (11) 

Hence if 8 is large, y will have a distribution of approximately the same shape as z. We also 
note from (11) that an increase in 8 may be expected to decrease the standard deviation ofy, 
We shall now restrict ourselves to a special class of transformation functions. A trans¬ 
formation will be called symmetrical if there is a unique number y such that 

/(y + y') = --/(y-y') 

for all y' * It follows that /(y) =s 0. For symmetrical transformations, therefore, y 0 = y if 
y — 0; further, if y = 0, the distribution of y is symmetrical about y since the changes in 
probability density are symmetrical about the centre of the distribution of z. If y is not zero, 
the distribution of y is skew. The parameter y is thus particularly associated with skewness. 
In general, however, 8 also affects skewness, and y affects the kurtosis. As suggested by 
Fig. 1, other relations may be traced between (a) the shape of the distribution of y, and (b) the 
form of f(y) and the magnitude and sign of y. 


2'4, Fitting and errors in fitting 

The methods of fitting curves in general use are 

(i) The method of percentile points. 

(ii) The method of moments. 

(iii) The method of maximum likelihood. 

Method (i) is peculiarly suitable for fitting curves of a translation system. The percentile 
points of the distribution of y can easily be expressed in terms of the corresponding points 
of the distribution of z, and these latter will usually be tabled. 


* The transformation shown in Fig. l,f(y) = log {y/(l~y)}, is symmetrical about y = f 



N. L. Johnson 


155 


Should the moments of y be of fairly simple form, method (ii) may be used. If all four 
parameters y, 8, I and A are to be estimated, y and 8 are first determined from /?,(£) and /J s (ce); 
then £ and A are determined so that agreement in mean and standard deviation is obtained. 

Method (iii) is rather difficult to apply to translation systems. However, a method of 
successive approximation can be worked out which, though tedious, is straightforward and 
applicable to all cases of transformation to normality. 

Although the process of fitting reduces to the estimation of y, 8, £ and A, the accuracy of 
these estimates is usually of less intrinsic interest than the accuracy of probabilities (or 
expected frequencies) calculated from the fitted curve. For a given form of distribution of z, 
it is a simple matter to investigate the variation in computed probabilities associated with 
variation in the values assigned to the parameters, assuming that the correct form of function 
rn x _ £)/A} has been chosen. A brief study of the effect of an incorrect choice of this function 
in certain special cases will be given in § 3-6. 


3. Special systems 


3-1. The log-normal system 

The most common transformation of type (6) is that termed by Gaddum (1945) the log¬ 
normal transformation. 

If z = y + <Hog^^) (12-1) 

or = y + dlog y, (12-2) 

z being a unit normal variable, the distribution of x (or of y) is said to be log-normal. 

The transformation was proposed by Galton (1879), anticipating the form of argument 
used by Kapteyn, and some properties of the distribution were obtained by McAlister (1879). 
Fechner (1897) also used the transformation in a special application, bub the idea was not 
further pursued by these authors. Kapteyn & Van Uven (1916) gave a graphical method of 
fitting the distribution and investigated its shape. Wicksell (1917) dealt rather more fully 
with the subject. He pointed out that the log-normal transformation is obtained by putting 
0(Y) = Y in (5); that is to say, by assuming random increments proportional to the variable 
to which they apply. Wicksell also obtained the moments of the distribution of y. We have 


_ e i M-*-ryS-\ (13) 

It follows that A = (w- l)(w+2) 2 b/A i > 0),) .... 

/ ff 2 = w 4 + 2w 3 + 3w s -3, | ( ] 

where w = e 5 ' 2 . The ((i x , /? 2 ) points for log-normal distributions therefore lie on a curve defined 
by the parametric equations (14). This curve is shown in Fig. 2. This restriction of the locus 
of is to be expected since (12-1) can be written 

2 = (y — £ log A) + 5 log (» — £), 


so that there are only three independent parameters and, without any loss of generality, 
(12*1) may be rewritten 


z = y + d log (*-£)■ 


(15) 



156 Frequency curves generated by methods of translation 

Wicksell also proposed a method of fitting log-normal distributions based on the observed 
moments m[, m 2 and w 3 of the distribution of x. The positive root of the equation 

t 3 + 3t — ^jb 1 = 0 (16) 

(where = m 3 /m|) is found. in (15), is then estimated by means of the formula 
i - m[ - sjmjt. Estimates of y and 8 are then obtained quite straightforwardly from (13). 
Yuan (1933) gave tables to facilitate the solution of (16). Quensel (1945) has given expressions 
for the standard deviations of estimates obtained by this method. Finney (1941) pointed out 
that the mean and variance of the transformed variable log (x - £) should be used if efficient 
estimates of y and 8 are to be obtained, and obtained expressions for such estimates. These 
could not of course be applied directly if £ is unknown. 

If y is log-normal p(y) ~ -f^y;^ e~ i( y^ l08y)!i (0 <y). (11) 

Yuan pointed out that this distribution has infinitely high contact at either end of its range 
of variation, since 

lim y~ n p(y ) = o = lim y n p(y) 

y-+ o oo 

for all values of n. 

The log-normal system has proved useful in a number of applications. We may mention 
its use in dosage-mortality problems (e.g. Gaddum, 1945) in the graduation of economic data 
(Gibrat, 1931; Frechet, 1945) and in agriculture (Cochran, 1938). Williams (1937, 1940) has 
applied the system to a varied collection of problems. 

3-2. Extension of the logarithmic type of transformation 

Despite its successful application in a number of cases, the log-normal system is restricted 
in flexibility, just as is Pearson’s Type III distribution, because the associated (/? x> /? 2 ) 
point must lie on the curve defined by equation (14). It seems reasonable to suppose that 
useful extensions of the system might be obtained by using different functions f(y) in (7) 
(or /{(&-£)/A} in (6)). We shall now consider the construction of such new systems, and will 
start by laying down certain properties which it appears desirable that they should possess. 

(i) In order to avoid a restricted locus of variation for the function / should be 

such that in equation (6) there are four truly independent parameters. 

(ii) The new systems should fit in naturally with the log-normal system which could be 

regarded as a transition form, lying between two systems of distributions, one with a range 
of variation bounded at both extremities, the other unbounded at either extremity. By 
analogy with the Pearson system, it is to be expected that in the (/? u /? a ) plane the system 
with a bounded range of variation will cover the region between the log-normal line, and the 
limiting line -1 = 0; while the other system will cover the remainder of the (Pufii) 

plane. 

These regions are indicated in Fig. 2, wherein are also introduced the symbols S L for 
‘log-normal system’, S B for ‘bounded system’ and S v for ‘unbounded system’, which will 
he used in the remainder of this paper. It may be noted that the scheme of curves sketched 
above is not strictly analogous to the Pearson system, as in the latter there is a region in the 
(Pi, fti) plane corresponding to range of variation bounded at one end only (Type VI). 

(iii) Finally, in the choice of the function/(y) the considerations detailed in § 2-2 must be 
kept in mind, and the requirements therein scheduled satisfied as far as is possible. 



N. L. Johnson 


157 


3-3. Choice of new transformation functions 
Consider the log-normal variable x, defined by 

z = y + dlog(a;-£) (£<x). 

Putting y = l-£/*> we have 

2 = (y + dlog£) + dlogy^ (0<t/< 1). (18) 

Axis of 

0 12 3 4 



The log-normal line is marked fix 

-Pearson Type III. 

.Pearson Type V. 

- Boundary of bimodal curves of system S B . 

A transformation of this type is also obtained from the general formula (7) by putting 
Ay) = log{y/(l -y)}. Putting y = (x— f)/A, we have as a particular case of (6) 

z = y + S l°g|^-^~ (£<^<£ + A)- (19) 

The system of curves generated by (18) or (19), z being a unit normal variable, will be our 
system S B . 






158 Frequency curves generated by methods of translation 

We may note that the proposed function f(y) satisfies the conditions laid down in § 2-2, 
that it should be simple and calculable without undue difficulty. In fact 

f(y) = lo gj-Z^ = 2tanh -1 (2y- 1). (20) 

Tables of inverse hyperbolic tangents (Milne-Thompson & Comrie, 1931) may therefore 
be used to evaluate/(y). We further note ,that/(y) has the desirable property that it increases 
from —oo to +oo as y increases from 0 to 1. 

The transformation (19) was suggested by Wicksell (1917) as possibly being worthy of 
study. Bartlett (1937) has stated that (19) proves useful in certain analysis of variance 
problems. It may also be noted that Fisher’s z' transformation for the correlation coefficient 
is of form (19) with 

£ = -l, A = 2, y=-i>-3)logi±£, 8±y(n-S), 

where n = sample size, p = population correlation coefficient and x = r, the sample corre¬ 
lation. 

The construction of our system S rj is rather more arbitrary, though suggested by analogy 
with S L and S B . The transformation by which we shall define S v is 

, = y+ - r+»>ogfx f +y[(^ l )‘+>]} ( J1 > 

or z = y + dsinlr^y = y + 8\og[y + J{y 2 + 1)]. ( 22 ) 

Milne-Thompson & Comrie’s tables of inverse hyperbolic sines may be used to evaluate 
f(y) in this case. As required in § 2-2, f(y) increases from — oo to + oo as y increases from —co 
to +oo. Beall (1942) and Bartlett (1947) have suggested the use of the function sinh -1 ^ 
or sinh _1 A /(y + £), especially with reference to negative binomial variables. 


From (19) we have immediately 
8 1 


3-4. The system S B 


pw_ v(fe«i^" p ['5( r+ ' !ioe i^n <° <!,<i) ' m 

fe-iy 2 

Hence P(y) = ^ y—i/—(r5+D (i - yyWioga-y^yi-i g-e^iog i/ioga-v), 

V( 27T ) 

so that lim y~ n p{y) = 0 = lim (1 — y)~ n p{y) 

v-* o V->1 

for any value of n. The distribution curve of y therefore has ‘high contact’ at either end of 
its finite range of variation. 

Inverting (19) we have y = (1 + e -<=-?)/i)-i, (24) 

Hence the median value of y is (1 + e^)- 1 . The equation to be satisfied by any modal value 
of y, other than the extremities of the range of variation, is 


Putting 


2y-l = *(y+*l 0gT l-j. 
y = W+i). 
tf-rs-t* lpg—^C. 


(25) 



N. L. Johnson 


159 


The number of intersections of the straight line u = y' ~y8 and the curve 

w = <J21og irf' (26-i) 

determines whether the distribution of y is or is not bimodai. If there is only one intersection, 
the distribution is unimodal, if there are three intersections it is bimodai. Supposing, for 
the moment, that y > 0, there is clearly one intersection in the interval -1 < y' < 0. There 
may be two other intersections. These must be in the interval 0 < y' < 1 if they exist. In the 
limiting case, the straight line in (25-1) will touch the curve at some point in the interval 
0<«/' < 1. At this point the slopes, of line and curve must be equal, so that 

l = 2 d 2 (l- 2 /' 2 )- 1 , i-e. y' = 7(l-2d 2 ). 

Hence the line will touch the curve at this value of y' if 

7(1 - 2 d 2 )-yd = 2d 2 tanh-i ^(1 - 2 d 2 ), 

j e , y = d-y (1 - 2d 2 ) - 2d 2 tanh- 1 7( 1 - 2d 2 )]. 

It follows that the necessary and sufficient conditions for bimodality (whatever the sign of y) 
are d< 1/^/2, | y | < £-*7(1 - 2d 2 )-2dtanh- 1 7(l-2d 2 ). (26) 

Table 1 shows the limiting values of | y | for various values of d. 


Table 1 


8 

Maximum | y \ 

8 

Maximum |y| 

O'7 

0-0027 

0-3 

2-12 

0-6 

0-176 

0-2 

4-02 

0-5 

0-633 

0-1 

9-37 

0-4 

M2 




Figs. 3 and 4 show the limiting curves y = 0, d = 1/72 and y = 0533, d = 0-5. Clearly 
these limiting curves will have a nearly flat horizontal portion with an inflexion at 
y' = 7(1 - 2d 2 ); i.e. at y - it 1 + aA 1 " 2< * 2 )]- We also note that, if d be fixed, as y increases, 
the ‘permanent’ mode is always below y = J[1 — 7(1 — 2d 2 )], the anti-mode, when present, 
is between £[1 ~7(1 — 2d 2 )] and ^[1 +7(1 — 2d 2 )] an d the secondary mode is above 
it 1 +7(1 - 2d 2 )]. Fig. 5 shows a symmetrical bimodai distribution, while Figs. 6,7 and 8 show 
typical unimodal curves of S B . The boundary above which (fi v /3 2 ) points correspond to 
bimodai curves has been shown in Fig. 2, as far as it has been explored. 

The moments of the distribution of y are complicated in form. They are discussed in the 
Appendix, where some numerical values are given. It is also shown in the Appendix that the 
(An Aa) points of curves of the system S B cover the area between the log-normal line and the 
straight line —ft _ \ = o. Until sufficiently comprehensive tables are available, it is clear 
that it will not be possible to fit curves of types S B by the method of moments. For practical 
purposes the method of percentiles is the most convenient to use, though in certain special 
cases it will be possible to apply the method of maximum likelihood in quite a simple manner. 

The process of fitting is considerably simplified if one or both end-points of the distribution 
°f* are known. We shall deal with the three eases of (a) both, (b) one, (c) neither of the end- 
points known. 




160 


Frequency curves generated by methods of translation 
(a) Both end-points known 

In this case, both £ and A are known. Hence, given the value of x, the value of the trans¬ 
formed variable log (x -£)/(£+ A -x) can be obtained directly. Corresponding to observed 
values x v x s ,...,x n there will be transformed values f v f 2 , where 


System S a 
r =0; S=1/VI 
0, = OOO; Si=167 



Fig. 3 


System S, 
y=0; 8=05 




System S, 
j»=0; 5™2 

0,-000; M 



The problem then reduces to that of fitting a normal curve to the observed f’s. Fitting this 
curve by moments we have 

y = ~fl s f> $ — V 5 /i 

where / = w 1 £ U s} = £ (/, -/) 2 . 

i—1 i=l 

This will give the maximum likelihood estimates for y and 8. 

However, a difficulty arises if the original data are not given in extenso but as a grouped 
distribution. If the original groups (for the variable x) are of equal length, the transformed 
groups (for the variable /) will be of unequal length and there will be groups of infinite length 
at either end of the distribution. Moments calculated from such data would require correc¬ 
tions which would be difficult to ascertain. The method of percentiles is very simple to apply 
in this oase. An application of the method is described in Example 1 (pp. 168,169 below). 


E 




X 


/3<-0; /3 a =+51 


System S 


y=1; 6 = 2 


p, = 0*081 pi = 2'77 


(t>) One end-point known 

Suppose the lower end-point, i.e. £, to be known. In this case a convenient application 
of the method of percentiles to estimate A, y and 8 is as follows: 

From the data we estimate the median $ 0 , and the lower and upper 100P % points 
and £ 2 . Then we have to estimate A, y and 8 from the equations 

- Zp = y + 5 log - \ ^ , 
f + A-^ 

°- ? + il0g £^- ,28) 




e-^dt= P. 


Biometiika 36 








162 Frequency curves generated by methods of translation 

From equations (28) we obtain 

JV£) 2 fo-Aft.-g) 


whence 


(g+^-*o) a + 

A — - - "ya y y _ > 

-A-O 


(29) 

(30) 


where X ; = $i — £ (® “ 0,1,2). 

A A 

£ and S may then be found from (28). Alternatively, using the value of A obtained from (30), 
y and S may be estimated from the observed mean and standard deviation of 

log (:r~ £)/(£+A-z), 

as in case (a). 

(c) Neither end-point known 

In this case all four parameters g, A, y and S have to be estimated. The method of percentile 
points in this case requires that estimates be obtained of four values x A , x B , x c , x D say, such 
that certain fixed proportions P A , P IS , P c , P D , respectively, of the distribution of a: fall below 
these values. £, 1, $ and S have then to be found from the equations 

y + S log y X f~ ^ , ( 31 ) 


Zl- ■ 


wliere 


_i_r 

V(2rr)J- 0 


’f+X-a;*.’ 
e-v*dt = P k . 


(32) 


These equations may be solved by successive approximation. In Example 2 (pp, 169-171 
below) only an approximate solution has been obtained. The values shown could be improved 
by the standard method based on Taylor’s expansion. 


3-5. The system S v 

From (22) we have immediately 

p(y) = f(2ijj(yl + i) ex P[~ Mr +s i°g hj +V(y 2 + 1 )]}*]• (33) 

Evidently y n p{y)-* 0 as y-^ — oo or y-> + co, so that there is ‘high contact’ at either end of 
the infinite range of variation of y. 

Inverting (22) we have y = — e^-Pt 6 ) — sinh j. (34) 

Hence the median value of y is - sinh (y/d). 

From (33) the equation to be satisfied by any modal value of y, other than the extremities 
of the range of variation, is 

y/(l + Z/ 3 ) = -% + 51og[y + V(Z/ 2 + 1 )]}- t 35 ) 

From graphical considerations it is evident that there is only one solution of (35), and that 
this solution is between the median and zero, Hence when y is positive the mode is greater 
than the median, implying negative skewness, and vice versa. Since the transformation 
generating system S n is symmetrical about y = 0, and f'(y) = (y 2 + l) -1 is a decreasing 
function of | y |, this is to be expected. 

Figs. 9-11 show typical curves of system S v , 



N. L. Johnson 


163 


The moments of the system S v are determined with much greater facility than are 

as * Wehave = _j__ r 

r V(2 7T)J_ 


e-M S 2-r^z-y)m _ e iz y)ISy fa 


Hence if v is even 

fi, - 2 - (, - 1) (‘’St - i) s Q ^‘'-^‘"eosh [(,-2s) (y/i)] + (- , (36-1) 

i(?*—i) fw\ 

and if r is odd /4 = 2-<-« S q (- l) s+1 [J sinh [(r - 2a) (y/i)]. (36-2) 


From equations (36) it follows that 
fi[ = -odsinhO, 

/<2 = M w - *) ( w coa ^ 2 ^ + 1), 

^. 3 = - £ w*(o> — 1 ) 2 {w(w + 2) sinh 3£1 + 3 sinh £1), 

/( , 4 = i(w - l) a {w 2 (m 4 + 2w 3 + 3« a - 3) cosh 40 + 4m> + 2) cosh 20 + 3(2w + 1)}, 
where m = e 15 a , O = y/8. 


From (37) we see that if y is positive the inequalities mean < median < mode hold, while 
if y is negative the direction of the inequalities is reversed. Also when y = 0, = 0 (as should 

be the ease) and /? z = |(oi 4 + 2w 2 + 3). As y tends to infinity, 8 remaining fixed, we have 


lim /?! = (co — I)(w + 2) 2 , lim /? 2 = m 4 + 2w 3 + 3n> 2 -3. (38) 

y —> co y *—> co 


As y increases from zero to infinity, therefore, the (/?,, fl 2 ) point varies from (0, |(m 4 + 2w 2 + 3)) 
to a point on the S L line. 

As 8 decreases from infinity to zero, o> increases from zero to infinity. Hence the (/? x , /? 2 ) 
points for system S v cover the region of the (ji v /? 2 ) plane ‘below’ the S L line, as sketched 
in Fig. 2. 

The calculations involved not being too lengthy, fairly extensive series of values of the 
mean, standard deviation, jl x and /? a for distributions of systems were computed. They 
are not reproduced here, but Fig. 12 is an abac based on these calculations. Using this abac 
the parameters y and 8 can be estimated, and £ and A then determined to give the required 
mean and standard deviation (cf. Burr (1942)). Given y and 8, it is not difficult to calculate 
£ and A from (37). No further tables or abacs are, therefore, given. In the case of system 
S n , however, where fi[ and ff are not easily calculated, a second abac, giving /i[ and er as 
functions of y and 8, would be required. 

It may be noted that it is in the system that the necessity for estimation of all four 
parameters is likely to be of most frequent occurrence. In the ease of S L and S ri , £ (also A 
inb 1 ;,.) often has an obvious and simple meaning. £ (and A) may often be fixed in advance in 
such cases. In general there is no such simple interpretation of £ and A in the case of Sy. 
There is, indeed, the particular result that for the symmetrical curves of S rJ , x = £ is the 
axis of symmetry. Otherwise, the relation of £ and A to the position and size of the curves is 
not simple. 

Examples 3 and 4 (pp. 171, 172 below) describe the fitting of curves of system S v to 
observational data. It appears that the curves give a good approximation to Pearson Type IV 
curves. As S v is much easier to deal with than Type IV, especially with regard to the compu¬ 
tation of probabilities, it seems that S v might be used as an approximation to Type IV even 
when the latter is considered the more reasonable curve to fit. 


11-2 



Scale of p 


164 


Frequency curves generated by methods of translation 


'■y p 9 |»s 


rj~ rf* it- ■*}* 'tf* tV -it- J*. 



CO 0-2 s-t Z'l oc-v . D6-OS3-0 C8;0 SZ-P 0Z*0 S9*0 0>Q SS-S 05*0 S t~C Ofr-O SfrO 0D0_ SZ‘0 OE-Q Sl»0 OE»Q SOO 


165 


N. L. Johnson 


3 - 6 . The transformation applied to certain Pearson curves 

Since experience has shown that curves of the Pearson system are representative of a wide 
range of frequency distributions met in practice, it is of interest to ask how far the application 
of the S L , S B and S v transformations to variables following distributions of this system will 
result in a transformed variable following approximately the normal law. We shall suppose, 
then, that y follows a probability law of the Pearson system and compare /b(y),/4(?y) 
and /h( 2 )>A( z ) with the normal values 0 ,3 respectively. 

Some of the results obtained in this section coincide with those which Aroian (1941) and 
Wishart (1947) obtained in the course of work on the distribution of statistics employed in 
analysis of variance. 

(i) S L applied to Type 111 

If p(y) = y"" 1 e~ v (0 < y < co), 

then the cumulants of z = y + S log y are 


*i (z) = y + W», x r (z) = W-«(v) (r> 2 ), 

d s+1 log F(r) 


(39) 


where P w (r) = -—the (s + 2)-gamma function and we write x F ,; 0 ) (r) as x F(r). 
We have ^ = [x F <i)( v )]- 3 > ^ = 3 + 'F®(r) [ l FW(i/)]~ a . (40) 


Using the asymptotic expansions for x F (3) (v) we obtain 

/?,(*)«*• 3+ 2v-i (41) 

(valid for v not too small). 

Since fi^y) = 8r~ x and [ify) = 3 4- 12r _1 , we see that S L does produce a variable with shape 
coefficients nearer the normal values than those of the original distribution, when applied 
to Type III variables. 

(ii) S L applied to Type VI 

If p(y) = yV ~ 1 ( y+1 ^' T (° < y < °°) > 

then fijfa) = [T»(r - v) - X F®(»)] 2 [ x F ( «(t - v)+ x F<«(v)]- 3 , 1 

A(*) = 3 + LT»(t-m)+ 3 F«(v)]['F‘%-p) + x F<«(>»)]-*./ 1 ’ 

If both (r - v)' and v be sufficiently large wo have 


A(z) =? v 1 - 4r 1 + (t - v)- 1 , \ 

/? 2 (z) = 3 + 2r 1 + 2(r - v)- 1 - 6 t - 1 ,} 

which compare with Pi!&) =? dn ' 1 — 4r _1 + 16(r — v)~ l , \ 

/l 2 {y) = 3 + Gn- 1 - Gr' 1 + 30(r - »)-*. j 


(43) 

(44) 


(iii) S L applied to Type V 
Here 


p{y) *= 


- y~ (P+-1) p—VV 

r (vf 


(Ocycoo). 


This case is similar to (ii), as is to be expected, since a Type V variable may be regarded 
as the reciprocal of a Type III variable, while S L remains of the same form if y be replaced 
by its reciprocal. 



166 Frequency curves generated by methods of translation 

(iv) S L applied to Type 1 (and II) 

if p{y) = ~ y)Tl ( ° <y< 

then. Pfz) = [T®(v) - + r)]* [T<»(n) - l P™(n + r)]-», \ 

/3 t (z) = 3 + ['i' (3 >(^) r)] [T<» - T<'>(r + t))"*. J 

If v and r be sufficiently large then 

P 1 (z) = 4r _1 + v- 1 - ()' + t)- 1 , \ 

/? 2 (z) = 3 + 67- 1 + 2J ’- 1 - 2(v + r)" 1 ,/ 

•which may be compared with 

/?i(y)==4)j _1 + 4 r_:l -16(n + r) _1 , 1 
p t {y) = 3 + 0 t- 1 + Qp- 1 - 30(n + r)- 1 . j 

(v) S n applied to Type 1 (and Type II) 

If, as before, p(y) = -y) r_1 {0<y< 1), 

then &(e) = tT (2) (p)-T®(r)]2['F(«(v) + l F (1) (T)]- 3 , 1 

fi a {») = 3 + [H r <3)(r) + 'F»)(t)] [T^V) + H™(t)]- 2 . J 

If v and t both be sufficiently large, then 

/?i(z) = e" 1 + t _ 1 - 4(n + r)' 1 , 1 

/?,(*) 4= 3 + 2V” 1 + 2T- 1 - 0(v + r)- 1 .) 


(45) 

(46) 

(47) 

(48) 

(49) 


These formulae may be compared with (46) and (47). 

As would be expected, it appears that the S b transformation generally produces a closer 
approach to normality than does the S L transformation applied to the same Type I (or Type 
II) variable. In particular, if the original variable be symmetrically distributed, preserves 
the symmetry while S L does not. Table 2 below provides numerical comparisons in a number 


Table 2 



of special cases. The fifth and sixth lines of this table indicate that, as is to be expected, the 
relative superiority of S# diminishes as the (/?,, /? 8 ) points of the Pearson curves approach 
the Type III line (and are hence nearer the log-normal line). Fisher’s z' transformation for 









N. L, Johnson 


167 


the correlation coefficient, in the case p = 0, provides an example of the application of S rj 
to a Type I variable. We have, putting r = 2R- 1, 

P(B) =- - -R) i(,, - 4) (0 < R < 1), 

* = 1 V( w - 3 ) lo g = 1 V( w - 3) lo s • 


/?i(z) = 0, 

A(*) = 3 + -1) ['F (1 >(^ - 1 )]-», 


(50) 


(vi) S v applied to Type VII 

T(v) 

if p(y) = y„T{v-~i) ^ 1+yt ^ (- co <y < + c °). 

then A(*) = 0, /? 2 (z) = 3 + i'F<»(n-|)pFa)(r-i)]^. (51) 

Table 3 compares the /? 2 ’s of corresponding distributions of y and z for various values of v. 


Table 3 


V 


AW 

V 

fitly) 

AW 

1 

CO 

5-000 

4 

5-000 

3-322 

2 

00 

3-806 

5 

4-200 

3-245 

3 

9-000 

3-466 

6 

3-857 

3-107 


The above discussion has been concerned only with the shape of the distribution of the 
transformed variable, as judged by the values of fifiz) and AGO- Even where these quantities 
differ appreciably from the normal values of 0 and 3, it is, however, possible that the trans¬ 
formed probability integral could be regarded as normal for practical purposes. Further 
investigation on this point would be of interest. 


4, Conclusion 
4-1. Critical summary 

The following comments may prove helpful in assessing the value of the systems of curves 
described in the foregoing pages: 

(i) The systems S L , S B , Sjj, together with the normal curve combine to give a variety of 
shapes of curve as wide as that provided by the systems of frequency curves in general use. 

(ii) The fundamental property that each of these curves may be transformed to a normal 
curve by a simple transformation may be regarded either as a practical convenience, or as 
a desirable property based on the arguments of Kapteyn and Wicksell. The first of these 
reasons is of considerable importance, but it should be noted that simple exact tests of 
significance are, even theoretically, obtainable only for a very restricted range of problems 
(Bartlett, 1947). 

(iii) S L is, of course, a well-established system. Of the systems S n and S v , S £ is based on 
the simpler transformation, but S a has the simpler expressions for its moments. The fitting 






168 Frequency curves generated by methods of translation 

of Si; seems to be more straightforward than that of S B , except when the limits of variation 
of the latter are definitely known. 

(iv) Curves of S L , S B and S v all have high contact at the extremes of their range of varia¬ 
tion. This may sometimes prove a drawback at the finite limits of variation for systems 
S L and S B . 

(v) Except for discrepancies at the ends, which may be associated with (iv), curves of 
S L , S B and S C r agree generally with Pearson curves having the same (or nearly the same) 
first four moments. The use of the former curves may sometimes be considered simply as 
a convenient aid in calculating rough figures for subrange frequencies of the latter curve. 
In a note added to the paper by Pretorius (1930), R. Pearson suggests the use of S L in this 
capacity relative to certain Type VI curves. 

4-2. Other translation systems 

A further point of interest arises in the fact that all the moments of all curves in the 
systems'$£, S B and S v are finite, By comparison, it is known that the higher moments of 
certain of the Pearson curves can be infinite. While finiteness of moments is in many respects 
a desirable property, it may be argued that such finiteness might restrict the systems 
relatively to curves with very long tails. It may be noted that such curves might be covered 
by choosing a different distribution for z. In particular, we might suppose z to be distributed 
according to the first law of Laplace 

p(z) = (-co< 2 <oo). (62) 

Frechet (1928, 1939) has suggested that more use might be made of this law; his arguments, 
combined with those of Kapteyn, would lead to systems of curves defined by 

s = y+*/(^)> (63) 

withp(z) given by (62). Inserting the particular forms for f{{x - £)/A}, we would obtain systems 
S' L , S' B , S'u corresponding to the systems S L , S B , S v . It is easy to show that the system S' v 
can have infinite moments. 

Clearly, by choosing fresh forms for p(z) a great variety of systems of curves could be 
constructed, but practical considerations naturally limit those cases which it is worth while 
to study. Mention may be made of the work of Olshen (1938), who has investigated certain 
transformations of the Pearson Type III distribution. 

5. Numerical examples 
Example 1 

For this example data used by Pearse (1928) were employed. The data gave the distribution 
of cloudiness at Greenwich for the period 1890-1904, excluding 1901. The last column, 
headed Type I, gives the frequencies obtained from a Pearson Type I curve fitted to the data 
by Pearse. Three moments were used in fitting this curve, the length of the range of variation 
being fixed in advance. 

Curve <S B (1), the frequencies for which are shown in the third column, was fitted to the 
observed data on the assumption that the degrees of cloudiness stated as 0,1, 2 ,..., 10 could 
be regarded as corresponding to groups -0*5 to 0-6, 0-6 to 1-5, ..., 9-5 to 10-5. £ was thus 



N. L. Johnsok 169 

fixed at -0-5 and A at ll-O. y and 5 were then chosen to give exact agreement in the two 
extreme groups. These values were 

y =-0-3110, 5 = 0-25166. 


Table 4 


Degree of 
cloudiness 

Observed 

frequencies 

w 

*b(2) 

Type I 

0 

320 

320-0 

320-0 

321-7 

1 

129 

100-9 

120-9 

121-5 

2 

74 

73-9 

72-0 

75-1 

3 

68 

63-8 

57-5 

61-4 

4 

45 

59-8 

52-1 

56-0 

5 

45 

59-0 

51-6 

65-2 

C 

GS 

63-4 

54-9 

57-8 

7 

65 

72-0 

63-9 

65-5 

8 

90 

90-0 

85-5 

83-2 

9 

148 

135-4 

160-7 

139-6 

10 

676 

676-0 

676-0 

678-0 


1715 

1715-1 

1714-9 

1715-0 

Value of y? 

— 

18-44 

5-76 

6-52 


Curve S b (2) was fitted in the same way, except that it was assumed that the successive 
groups were 0 to 0-5, 0-5 to 1-5,..., 9-5 to 10. £ was therefore put equal to 0 and A put equal 
to 10. The values of y and S obtained were 

y =—0-3110 (as before), 5 = 0-19681. 

iS b ( 2) gives a much better fit then S B (l) and, on the whole, a better fit than the Type I 
curve. All three curves fail to give sufficiently small frequencies in the trough in the centre 
of the distribution. 

Example 2 

The data used in this example relate to the age of Australian mothers at birth of a child 
(single births only) in the period 1922-6. Pretorius (1930) fitted a Type I curve to these data. 
The values of the moment ratios (/? x = 0-101, /? 2 = 2-430) indicate that a curve of system 
jS' b might he fitted. In this case, however, there are no obvious values to assign to the para¬ 
meters i and A. The method actually adopted (like that of Pretorius) was based on trial and 
error. It was decided to attempt a fit which would give nearly correct values for the 5, 30, 70 
and 95 % points of the distribution. Values were assigned to £ and A and then values of y 
and 5 obtained, so that the specified percentage points were, as far as possible, unaltered. 
The process was repeated to improve the fit. Details of the working are now given in respect 
of the curve finally fitted. 

For this curve £ = 15-0, A = 36-5. 




X70 Frequency curves generated by methods of translation 

From a cumulative diagram the following percentage points of the observed distribution 
were estimated: 

5% point 20-3 years 70% point 32-8 years 

30% point 25-6 years 95 % point 40-4 years 

With § = 16-0, A = 36-5, the values of log {(«-£)/(£ +A-a:)} at these points are: 

-0-7096, -0-4192, -0-0247, 0-3918 respectively. 

The normal equivalent deviates for 5, 30, 70 and 95 % are: 

-1-6449, -0-5244, 0-5244, 1-6449 respectively. 


Hence y and $ should satisfy the four equations 

— 1-6449 = y— 0-7096$, (i) 

-0-5244 = y-0-4192$, (ii) 

0-5244 = y-0-0247$, (iii) 

1-6449 = y+ 0-3918$. (iv) 


From (i) and (iv) we obtain y = 0-5978, $ = 1-2649; from (ii) and (iii) we obtain y = 0-5857, 
S = 1-2424. For the curve to be fitted, we took the values 

y = 0-5918, $= 1-2536. 

The table below compares the observed distribution, the distribution corresponding to the 
fitted S B curve and the Type X distribution fitted by Pretorius. 

Table 5 

















N. L. Johnson 


171 


Neither the S B curve nor the Type I curve fit well at the ends of the distribution. The 
a curve appears to give the closer fit in the central part of the distribution. For the range 
with central values 17-47 inclusive, we have 

S B curve: x i = 718; Type I curve: y 2 = 1759. 

Excluding both groups 17 and 47 (central values), we have 

S B curve: a 2 = 530; Type I curve: y 2 = 1375. 

As is almost invariably found to be the case when dealing with very large samples, exceed¬ 
ingly high values of a 2 are obtained. Differences between observation and theory, which 
may be practically unimportant from the point of view of graduation and which might not 
be picked out in samples of more usual size, are statistically significant having regard to the 
large numbers involved, 

Example 3 

The data on length and breadth of beans used in this example and in Example 4 were used 
by Pretorius (1930). In this case we shall fit a curve of system Sjj to the distribution of lengths 
of beans. The mean, standard deviation, /? x and /i 2 of the observed distribution are 


Mean = 14-399 mm,; ^ = 0-829; 

Standard deviation = 0-9036mm.; /? 2 = 4-863. 
Using the abac of Fig. 12 we find 

8 = 2-64; O = y/8 = 0-90, 


whence 

From (37) we calculate 


y = 2-38. 

-1- 1029 - "(f) 


0-5948. 


Hence 


A = 77^375 = i‘51.92, 

0-5948 

£ = 14-399+ M029A = 16-0745. 


There is necessarily some uncertainty in the determination of y and S from the chart, but 
investigation indicated that the degree of uncertainty should not seriously affect the fitted 
frequencies. Table 6 shows the observed frequencies, the frequencies calculated from the 
fitted fig curve, and the frequencies calculated for the Type IV curve fitted by Pretorius.* 


Example 4 

A curve of system S a is fitted to the distribution of breadth of beans referred to in 
Example 3, For the observed distribution 

Mean = 7-9755 mm,; /? x = 0-1943; 

Standard deviation = 0-3399mm.; /? a = 3-6544. 

Following the same procedure as in Example 3 the following values of the parameters of 
theS c curve were obtained: 

8 = 3-55, y = 2-13, A = 0-9721, £=8-6195. 

Table 7 compares observed frequencies, the fitted frequencies calculated from the S v 
curve and those calculated from the Type IV curve fitted by Pretorius.* 

* The groupings in the tails used in calculating x 2 are shown by tile braces to the right of the table. 



172 


Frequency curves generated by methods of translation 


Table 6 


Length 

(central values 
in mm.) 

Observed 

frequencies 

Su 

Typo IV 

<9-25 


2-6 

1-9 

9-6 

1 

2-7 

2-6 

10-0 

7 

5-8 

5-4 

10-5 

18 

121 

11-3 

11-0 

36 

25-7 

24-2 

11-5 

70 

65-2 

62-5 

12-0 

116 

118-0 

113-8 

12-5 

199 

249-3 

243-7 

13-0 

437 

508-7 

603-6 

13-5 

929 

970-6 

968-9 

14-0 

17 S7 

1642-5 

1638-9 

14-6 

2294 

2240-6 

2229-8 

16-0 

2082 

2130-3 

2132-6 

16-6 

1129 

1151-5 

1181-6 

16-0 

276 

290-1 

299-3 

16-5 

66 

32-2 

28-5 

17-0 

6 


1-4 

> 17-26 


0-1/ 


Total 

9440 

9440-0 

9440-0 

Value of x 4 

— 

87-1* 

102-5t 


* Excluding the ‘over 16-26’ group, x z = 66-3. 
f Excluding the ‘over 16-26’ group, x 2 = 70’1- 


Table 7 


Breadth 
(central values 
in mm.) 

Observed 

frequencies 

Su 

Type IV 

<6-26 


1-31 


6-376 

4 

3-7/ 

4*8 

6-625 

10 

13-8 

13-3 

6-875 

72 

53-2 

49-9 

7-125 

170 

182-2 

177-2 

7-375 

530 

557-8 

557-9 

7-625 

1397 

1394-2 

1413-3 

7-875 

2679 

2507-0 

2630-5 

8-125 

2742 

2757-0 

2732-5 

8-375 

1483 

1644-4 

1515-4 

8-625 

400 

381-5 

393-6 

8-875 

48 

41-5 

48-6 

9-125 

5 

2-31 


>9-25 


0-1/ 

3-0 

Total 

9440 

9440-0 

9440-0 

j Value of 

— 

17-47 

14-36 





N. L. Johnson 


173 


APPENDIX 

Moments of distributions in system S s 

j£„ = y + <51og{»//(l - ?/)} and z is a unit normal variable, then the rth moment of y about zero is 

y'Av) = V(W J_ „ e_V( 1 + **~ y),5 )- r dz ■ (64) 

This integral is not easy to evaluate directly, and values of /i' 3 , /4 and were obtained by the following 


steps. 


(i) For the case r= 1, tho expected value 




( 66 ) 


can bo evaluated directly using a result duo to Mordoll (1920, 1933). To throw (66) into a form to which 
Mordoll’s result applies, wo make the transformation t-y = - 2nSz leading to 


where 

By Mordell’s formula 


CO 

= J(2ir)&rl/ I e vi] l' l '~ i ’" ll (l + e 3 ’' , )~ r dt, 
v = —yS, \jr = 2 nS 2 i. 


whero 

and 


q = = fl’"# 


0 oo (i>,i (r) = H 6 «Vi trtfnvi,’- 1 + 2 ^ «»’»tfcaa2ron\ 

-CO H=1 


After some algebraic simplification, this leads to tho result 

JiH + 5" 1 S e~»‘Wcosh --rg " soch^- 2nd S e~ il2 »-i)VS’ s i n (2?i--l)7ry<Jcosech(2?i-l)7r 2 il !1 

1 . , ti=i *6 


'vw 


He ■ 


» = 1 _ 

oo 

1 + 2 E e~ 2,l ' 7 ’’ J 'cos2ri77y6 
»=l 


( 66 ) 


(ii) The higher moments can be expressed in terms of tho partial derivatives of tho first moment with 
respect to y. From (54) 

dfi'r r 1 f 03 c -a-yi/5g-l:' ^ 

Sy “ ~ 0 j(2?r) J _ oo (1 + a-b-vMyn dz 


Hence 


r 1 f m ((l + e-< ! -r>«)-l}e-^‘ 

~ ~JV(2m) J -« (T+T 1 ^®)^ 

= — S^ r ~ 

, ,JWr 

/h + I=/h + -^. 


<lz 


( 67 ) 


Applying this formula with r = 1, 2, 3 successively wo obtain 



174 Frequency curves generated by methods of translation 

Formulae (58), together with (66), make it possible to calculate /h, Hi and fi\. Although the analytical 
expressions for these moments must be very complicated, thoir numerical computation is straightforward, 
though tedious. 

(iii) The computational labour may be reduced by using the recurrence formula developed below. To 
emphasize the dependence of ji'r 011 Y and ws s h a ^ write 

1 I* <x> 

Then 8 ) * -77F-T a '“‘{( 1 + e ^~ y),S ) ~ e ^~ v ^ < 1 + ^~ y),s )- T * 

VKM) J -00 

= p'r-ily, 6 ) - c^r*" fitly + S). 

Hence ^(Y + ^V) * 8-» a " , +y*" ,) [/t r , _ l (y,^)-^(Y,^)]. (69) 

Remembering that ft& = 1 for all y and S, (59) makes it possible for the first four moments to ba 
calculated with fair rapidity for series of values of y at intervals of 5' 1 . 

A few of the calculated values of and j}% are shown in Table 8. Generally the moments wore calculated 
by methods (i) and (ii) for values of y between 0 and 5' 1 ; further results were then obtained by means of 
(iii). It was necessary to take care to avoid accumulation of errors in applying (iii). The first sets of 
moments were calculated to eleven decimal places, and the values of fix and /? 2 obtained should be accurate 
to the five places of decimals shown. 

Table 8 


Y 

5 = 0-5 

o 

T~\ 

II 

«0 

X 

cr 

A 

fit 

Hi 

CT 

Pi 

n 

0-0 

0-5 

1-0 

1- 5 

2- 0 
2-5 

0-50000 

0-35227 

0-22480 

0-12959 

0-06767 

0-03225 

0-31396 

0-29610 

0-24873 

0-18679 

0-12615 

0-07717 

0-00000 

0-36363 

1-64723 

4-59189 

11-15751 

26-40534 

1- 62731 

2- 07024 

3- 66177 
7-36733 

15-97162 

37-17352 

0-50000 

0-39797 

0-30327 

0-22147 

0-15546 

0-10636 

0-20829 

0-20151 

0-18262 

0-15541 

0-12465 

0-09468 

0-00000 

0-12803 

0-52856 

1- 24787 

2- 36420 

3- 98326 

2-13828 

2-32409 

2- 90011 

3- 98260 
6-71101 
8-34077 


Y 

5=2-0 


Hi 

cr 

Pi 

fit 

0-0 

0-5 

1-0 

1- 5 

2- 0 

2-5 

0-60000 

0-44125 

0-38402 

0-32971 

0-27942 

0-23394 

0-11813 
0-11066 
0-11235 
0-10560 
0-09607 
0-08710 

0-00000 

0-02084 

0-08279 

0-18400 

0-32168 

0-40116 

2-63131 

2-66720 

2-77419 

2- 95062 

3- 19309 
3-49645 


The transformation generating the system Sjj is symmetrical about y = -J, so that when y = 0 the 
distribution of y is symmetrical. Positivo values of y correspond to positive skewness. 

Since the curve is symmetrical when y = 0, it follows that fi[( 0,5) = whatever bo S. This can bo 
verified from (56). Putting y = 0, we obtain 

1+2 1 e-n’PS* 

K( 0 ,*) = -- r -~—- V 

7(277)5 1+2 s 6 - anVlS ' 

L n=l 










N. L. Johnson 


175 


This expression can be shown to be equal to -J- by putting i = 0, r = 2nS( in Jacobi’s imaginary transforma¬ 
tion of the theta function , , . 

(Whittaker & Watson, 1946, p. 476). 

Also, since fiftO, <S) = 1» it follows from (69) that 

/tJ(J-i,«S) = e-W-(l-i) = le-U~\ 

Again /tj(25 _1 , J) = = «-»-* - 

Similarly fiftSS- 1 ^) = e~ iS — e - !* ' + *, 

and generally, if A be a positive integer greater than unity, 


ftftkS^.S) = (-l)*+ie-U'4-T}+ 2 (_l)" C i«V-*1 

L 5=1 


( 00 ) 


(GO) is useful as a chock formula. 

Further interesting formulae may be obtained starting from the equation 

0 = MO, 8) = ^(0,5)-1/4(0, J) + l. (61) 

Such formulae arc useful in chocking calculations, but do not load to simple formulae for ftftkd -1 , 8), 
sinco fiftO, 8) is not a sufficiently simple function of 8. 

We now proceed to consider limiting values of and /? 2 for Sjj. We can write (69) in the form 


fiftkS-'.S) = (62) 

the forward difference, A, applying to the subscript of /i'. Applying (62) repeatodly and noting that (62) 
holds for negative as well as positive values of r, we have 

fiftk8~\8) = (-l)*e-»*'* -, A*/t r '-*(0 ,£). 

The sth negative moment of y about zero is easily found to be 


(63) 


fiLfty.S) = S eiiV-’+fr*- 1 , 
t=o V/ 

whence, if k > r, 

r/c-r flr — t 4 -r-1 

/<;(H- i ,<s)^(-i)* : e-i rt - s S(-i) j - r , )e»‘‘«-*+ s (- 
U=o \ r-1 / t=0 


I) 1 ' 


/4~I(0,8) . 


Hence if k is large fi' r (k8~ 1 ,8)==e~^ k, ^~’e^ k ~ r) ’^ 1 

sSse-1 BrtWfli-r'lJ-',, 

ft'Ay, 8) is a continuous decreasing function of y for sufficiently large values of y. Hence 

fifty, 8) == 8- r y'S -l +l rl 4 -1 when y is large, 
and so lim fifty, 8) = (< 0 — 1) (w + 2) a , 


(64) 


(65) 


lim /? 2 (y, 8) = w 4 +■ 2w 3 + 3 ai 2 — 3, 

co 


where 


( 66 ) 


As y increases from zero to infinity, therefore, the (/? t ,/? 2 ) point moves from a point on the axis of /? 2 
to apoint, on the S /, line (cf. (14)). 

Now consider the behaviour of fifty, 6) as 8 decreases, y remaining fixed. When. 5 is small we have 

(l+e _(!!_ T' )W ) _r =0 for z<y 
== 1 for z > y, 


so that 




(67) 


As 8 decreases the (/? lt /? 2 ) point therefore approaches a point on the boundary line /?,—/?!— 1 = 0. As 
y varies all points on the boundary are covered. 



176 


Frequency curves generated by methods of translation 


REFERENCES 

Aroian, L, A. (1941). Ann. Math. Statist. 12, 429. 

Baker, G. A. (1934). Ann. Math, Statist. 5, 113. 

Bartlett, M. S. (1937). Suppl. J. 11 Statist. Soc. 4, 137. 

Bartlett, M. S. (1947). Biometrics, 3, 39. 

Beall, G. (1942). Biometrika, 32, 243. 

Burr, I. W. (1942). Ann. Math. Statist. 13, 215. 

Charlier, C. V. L. (1905). Ark. Mat. Astr. Fys. 2, nos. 8 and 20. 

Cochran, W. G. (1938). Empire J. Exp. Agric. 6 , 157. 

Cornish, E. A. & Fisher, R, A. (1937). Rev. Inst. Int. Statist. 5, 307. 

Curtiss, J. H. (1943). Ann. Math. Statist. 14, 107. 

Edgeworth, F. Y. (1898). J. R. Statist. Soc. 61, 670. 

Epstein, B. (1947). J. Franklin Inst. 244, 471. 

Feciiner, G. T. (1897). Kollcletimmsslehre. Leipzig: Engelmann, 

Finney, D. J. (1941). Suppl. J. R. Statist. Soc. 7, 155. 

Fhechet, M. (1928). Bull. Sci. Math. 63, 203. 

Freohet, M. (1937). Recherchcs Thaoriquus Modames, 1. Paris: Gauthier Villars. 

Fhechet, M. (1939). Rev. Inst. Int. Statist. 7, 32. 

Freohet, M, (1945). Rev. Inst. Int. Statist. 13, 16. 

Gaddtjm, J. Ii. (1946). Nature, Bond., 156, 463. 

Gallon, F. (1879). Pm. Roy. Soc. 29, 365. 

Geary, R. C. (1947). Biometrika, 34, 209, 

Gibrat, R. (1931). Les In&galit&s flconomiques. Paris: Libraire du Recueil Sirey. 

Halmos, P. R, (1944). Ann. Math. Statist. 15, 182. 

Hotelling, H. & Franrel, L. R. (1938). Ann. Math. Statist. 9, 87. 

Kapteyn, J. 0, (1903). Skew Frequency Curves in Biology and Statistics. Groningen: Astronomical 
Laboratory. 

Kapteyn, J. C. & van Uven, M. J. (1916). Title as above. 

Kolmogoroee, A. N. (1941). C.R- Acad. Sci. U.R.S.S. 31, no. 2, 99, 

McAlister, D. (1879). Proc. Roy. Soc. 29, 367. 

Milne-Thompson, L. & Comrie, L. J. (1931). Standard Four.Figure Mathematical Tables. London: 
Macmillan. 

Morrell, L. J. (1920). Quart. J. Math., Oxford, 48, 329. 

Mordell, L. J. (1933). Acta Math. 61, 323. 

Olshen, C. A. (1938). Ann. Math. Statist. 9, 176. 

Pearse, G. E. (1928). Biometrika , 20A, 314. 

Pearson, E. S, (1931). Biometrika, 23, 114. 

Pearson, K, (1895). Philos. Trans. A, 186, 343. 

Pretorius, S. J. (1930). Biometrika, 22, 109. 

Quensel, C. E. (1945). Sfcand.'Mfctwar. 28, 141. 

Rietz, H. L. (1922). Ann. Math. 23, 291, 

Whittaker, E. T. & Watson, G. N. (1946). A Course of Modern Analysis, 4th ed. Cambridge Univer¬ 
sity Press. 

Wicksell, S. D. (1917). Ark. Mat. Astr. Fys. 12, no. 20. 

Williams, C. B. (1937). Ann. Appl. Biol. 24, 404, 

Williams, C. B. (1940). Biometrika, 31, 356. 

Wilson, E. B. & Hilferty, M. M. (1931). Proc. Nat. Acad. Sci., Wash., 17, 694. 

Wishart, J, (1947). Biometrika, 34, 170. 

Yuan, P. T. (1933). Ann. Math. Statist. 4, 30. 



[ 177 ] 


RANK AND PRODUCT-MOMENT CORRELATION 
By M. G. KENDALL 


Summary 


1 . Various relations are known to hold between the rank correlation coefficient which 
I have called r, Spearman’s rank correlation coefficient (which I shall denote by p s ), and the 
correlation p in samples from a normal bivariate population. Eor instance, writing t for 
a value of r in a sample of n and similarly r a for a sample value of p a> we have Greiner’s relation 

E(t) = ^sin-ip, (1) 

Bsscher’s formula for the variance 


rar ‘ - wAti [‘ “ (l 3i "“ v )"+ 2 (» - 2 ) (5 - (; *e)’}] ■ 

Moran’s relation (1948) 


E(r a ) = - 


6 


- (sin -1 /) + (n — 2) sin “ ] )p} 


( 2 ) 


(3) 


n(n+ 1) 

and K. Pearson’s formula connecting grade and product-moment correlation which is a 
limiting case of (3), g 

Pv = -sin-Hp. (4) 


Derivations of these results, other than Moran’s, and references, are given in my Bank 
Correlation Methods (1948). 

2 . In the present paper I extend these results in various directions and consider some 
practical applications. 

{a) I shall examine the effect of non-normality on equations (1), (2) and (3); 

(6) An investigation into var r s suggests that an exact expression involves non-elementary 
transcendental functions; 

(c) An expansion of varr s in powers of p, however, is obtainable, and comparison with 
K. Pearson’s formula for the variance of estimates of p based on equation (4) indicates that 
his result is incorrect; 

(i d ) The extension of r to m x n tables offers some possibility of estimating the first product- 
moment coefficient, even in non-normal variation. 


Effect of non-normality on t 

Note on the bivariate Gram-Charlier series 

3. Most studies in the past on the bivariate Gram-Charlier series consider an expansion 

(5) 




of the type 

f(x,y) =®{x,y)+ £ 

r+s>3 ?I o! 

where D 1 = djdx, D 2 = djdy and 


= 2^ (l->)t eXP [~2(T^) (a:a ~ 2p!Cy + ya) . ' 


(«) 


Biometrika 36 



178 Ranh and 'product-moment correlation 

It is much simpler, at least for my present purposes, to use the alternative form 

f(x,y) 

v 

1 


exp [s'( - l) r+s «(*) a (2/)> 


where 


a(x) = 


V( 2 *) 


-ix 1 


(I) 

( 8 ) 


and the summation S' extends over all values r +s> 3 together with the term k u . Here the 
k’s are the bivariate cumulants and the distribution is in standard measure, so that 
K n = /hi ~ P an d the terms in /c 10 , x 01 , k w and do not appear. It is easily shown that, as 
for the univariate case, the cumulative function of (7) is 
f(h,t 2 ) = log <j>(t v t 2 ) 

' 2 (H-\ 2 {itiYitiiY 


2 ! 


3i-+!h2+SV 


r\ a! 


= - \(t\ + t 2 +4)+ s K r 


(itxY (it 2 


0 ) 


r+os r\ a! 

4. The equivalence of (5) and (7) when the constants A rs are suitably expressed in terms 
of the cumulants is established without difficulty. The essential difference lies in the appear¬ 
ance of k 11 D 1 D 2 in the operator in (7) and the operand a(x) a(y) instead of <&(£, y). Now 

exp(fC 11 2) 1 D 2 )a(3:)a(i/) (10) 

has a cumulative function ~Wl + %Ph k + 4) > 

and is therefore equal to 0(m, y). Or again, if we expand ( 10 ) in powers of p (= x n ) we get the 
first differential coefficient with respect to x and to y of the tetrachoric series, which is 
equal to Q>(x,y).* 

5. In passing it may be noted from (7) that we have 




( 11 ) 


K. Pearson (1907) gave a particular case of this formula for the bivariate normal form, 
namely, 5 / _ 9 2 / 

dp dxdy' 

Relation between t and p 

6 . If two pairs of observed values are x it y i and Xp y.j, we write 

a H = sgn (Xi-Xj), 

= 1 { x i> x j)> 

= 0 (*<=*,), ■ (13) 

= -l (*<<*,■),. 

and similarly b i} = sgn ( Vi ~y } ), 

= i (yi>y } ),' 

= ° (& = %)»■ (14) 

=“! iVi < Vi)-, 

t m a sample of n is then defined as 

/ = 

= ^iAM n ~ l )> ( 15 ) 

If (7) is expanded, the coefficients will not in general decline so rapidly as in the expansion of 
(6), but as both forms have the same cumulative function this does not affect the above work. 



M. G. Kendall 


where the summation £ takes place over the n(n- 1 ) possible pairs of comparisons in the 
sample. 

Thus t depends only on the signs of the values x t —Xj, y i — The cumulative function of 
x . - Xj and y t - y, is given by 

f (<i> k) = log -®[exp {¥(*« - x j) + ~ Vj)}] 

= iog^i.^ + log^-tj, -t 2 ) 

= - (tl + Zpt^ + i) + 2 (16) 

where £'' extends over even values of r + s ^ 4. We then have at once the simple but important 
result: Greiner’s formula (equation (1)) is unaffected by skewness in the parent population 
as measured by its moments or cumulants of odd order. 

7 . I now neglect powers of fourth-order oumulants higher than the first and cumulants 
of order six or more. The cumulative function (16) then becomes 

f(h, k) = - (*a + 2 p<i k + t\) + 2 j 1 { H + + ~ M 2 + 24 ^ij > ( 11 ) 

and the o.f. (characteristic function) to our order of approximation is 

4>{ti,k) = exp[-{tl+2pt 1 t 2 + tl)]{l + ~ti + ^tlt i + ^tltl + l ^t 1 tl + l \ a jt i }. (18) 


1 e il ± 

Using the unitary function sgn£ = - -r-dt 

rr j - co rt 


= 1 (l> 0 ), 

= 0 (1=0), 
= -l (g< 0 ), 


we have, from (15), 

m = E(aM 

= £{sgn (x t - Xf) sgn (y t - y { )} 

r oo co 

= I dxdyf(x,y)Bgsx{x i -x j )sgn{y i -y j ) 

J — CO J — co 

1 I* 00 fit P 00 rJf f 03 r 03 

= — 2 jf dxdyf(x, y) exp + 

" J — 00 vli J - co J — oo J — 00 

1 f® dt , f” d < 2 J . /on , 

where ^(i x , t 2 ) is given by (18). Owing to the differentials of type dtjt we can change the scale 
of t x and k how we like and, reducing it by *J2 S we have 


In this, one of the simpler cases to be discussed, the integrals may be evaluated directly in 
various ways, but to assist later generalization it is as well to systematize the evaluation 
from the start. 

Writing r/ P \ „ 1 f°° ¥f°° dt 2 o ^ r . . o 4 . ioo.\ 


V C«^). 


= 4 f fM nr exp [ - £(a£f + 2pkk +P4)] 

« J — co J — 00 ‘'t'2 


12-2 



180 

we have 


and hence 


Manic and product-moment correlation 

dt 2 exp [ - \{at\ + 2 pt t t % + ftt\j] 

_P*Y l 

aft) ’ 


SL _ i | 

m 2 

dp 7T 1 


W(« 0 ) 


T 2 ' P 
L = -sm ' 1 

7T V a /? 


(23) 

(24) 


Putting a ~ ft - \ we reach equation (1). Differentiating L twice with respect to a we find 
for the coefficient of K i0 in ( 21 ) 

48[ 4 3^ L (v&j)l^ 1 ’ 


which reduces to 


247T 


(3p-2p»)(l-/,V. 


The other terms are evaluated in a similar way, the coefficient of k 31 being given by 
d^Lftdadp) and that of x 22 by 3 2 A/3p 2 . Except for the terms in K i0 and /c 04 we have one differ¬ 
entiation already carried out in equation (23). We find 

E{i) = ? sin - 1 p -I- {(*« + *«) (3p - 2p 3 ) - 4 (* 31 + x 13 ) + 6 px 22 }. (25) 

8 . At first sight it looks as if this expression is at fault near p = 1 , since (1 — p 2 )-* there 

becomes large. But there are limits on the value in braces in (25) which prevent the whole 
term from becoming large. Consider, for example, the extreme case when p = 1 . Since x 
and y are in standard measure we then have y~x = 0 , and hence E{e m ~ x) } =■ 1. Thus, con¬ 
sidering the separate vanishing powers in the expansion of this expectation we have, for 
the cumulants r , , 

2 (- 1 )'('**_,«<>, (26) 
l-o \J/ 

and in particular k w - 4k 31 - 1 - 6 k 22 — 4/c 13 + = 0, / 

which is the value inside braces in (25) when p = 1. One suspects that it might be possible 
to set limits to the corrective term in (25) when p is given, but I have not reached any very 
useful expressions for the purpose. 

9. By the substitutions 

*4o = / { 40 — 3; K 31 = Psi - 6 p> *23 = P 22 — 2p 2 — 1, (27) 

it is readily verified that (25) remains true when p’s are written instead of k’ s. Since no 
cumulant of order higher than two requires a grouping correction, raw or corrected moments 
may be used indifferently in place of cumulants in (25). 


Effect of non-normality on var f 

10 , The evaluation of var t proceeds from that of E(t 2 ) by a development of the foregoing 
method but is considerably more complicated. We have 

E(t 2 ) = E^a^fln^n-lft 

= S{E(ay6^a w 6 fc! )}/n 2 (ri.— l) a . 


(28) 



M. G. Kendall 


181 


There are three types of case: 

(i) 2 n(n - 1) cases where i = k,j = l. The expectation of each term is + 1 . 

(ii) n(n - 1 ) (n — 2 ) (n - 3) cases where i + j q= 1. The expectation of each term is then 

{Eatjbijf. 

(iii) 4 n{n - 1 ) {n - 2) cases of the type when i ~ h or j = l hut not both. This is the trouble¬ 
some term to evaluate. Consider the case i — k- 1, j = 2, l = 3. Then 


^12^13^13 


) = 


1 

,c0 d4 

r® d4f m ^4 

tf 4 . 

_co i-4. 

' —oo if 2 J —oo ^^3 t 


dt x 


0(4i 4> 4> 4)> 


(29) 


where 0 ( 4 , 4 > 4 ) is the c.f. of aq - x tl y l —y 2 and % - a: 3> y x - y 3 (reduced to any convenient 

scale, which throughout I determine by making the coefficient of terms in i 2 equal to - i). 
This c.f. is the mean value of 


exp L*h( :c i - x i) + *4(24 2 / 2 ) "t ^4(®i a a ) + i4(l/i 2 / 3 ).l 

= exp [-£(<! + 4 ) aq + i(U -I- 4 ) y 1 - it l x i -it 2 y 2 - it 3 x B - tf 4 J/ 3 ]. 

When we substitute expansions of the Gram-Charlier type we may neglect those terms which 
give a total odd power in t’s, such as t\t 2 , for the resulting integral will vanish. Thus no terms 
in the first power of the odd-order cumulants will survive but there will occur terms of type 
x 30 > K 3 a K 2 v etc. r ^° our order of approximation the c.f. becomes 


0(4> 4. 4. 4) = exp [ - \{t\ + 1\ + if + 1\ + 2pt x 4 + 44 +p4 4 +pU 4 + 44 + 2 p 44 )J 

x f 1 + atf ^ + 4) 4 + + 4}+ 24 " {(4 + 4 ) 3 (4+4)+4 4+*§ 4} 
+yf{(4+4) 2 (4+4) 2 +<!<!+ 2 ^ 4 } + 24"{(4+4) (4+4) 3 +4 + 4$} 
+|f{(4+4) 4 +4+«-j[Yf{(4+4) 3 -<?-/!} 


+^H(4+4) 2 (4+4) - l l 4 - ( i 4} + ^{(4 +4) (4 +' 4) a -4 4 - 4 <!} 
+§{(4+4) 3 -M}J]- (so) 

We have to substitute in (29) and evaluate term by term. Tor the term independent of the 
k’s we have, from the known result leading to equation ( 2 ), 


'2 • -i \ 2 

-sin V) - 
W / 



+ y = M{p), say. 


(31) 


11 . The evaluation of the integrals may be assisted by a general theorem which I now 
prove. 

Consider the integral 


/ = — 
7 T m 


d4 f 00 

— CO ^4 


^4 
? if.* 


exp[- 


IV, 

2 ^' 


/44I’ 


(32) 


where m is even and the summation takes place over i and j from 1 to m. We may change the 
scale as we please and hence 



(33) 



182 Bank and product-moment correlation 

when a is any positive constant. Then if the suffixes k, l, u, v are all different and rn in 

number / g g \ lf“ f w f l 

It —... -—)/ = •— 7 —jT- dt... dt m ex p ~ — 'Ea ij t i t j . (34) 

\3% 3a J 7r m (2a) lm J_ M 1 J-. ** T 2a 13 v ' 

Now let us suppose that a is the determinant of the form S a^tftj which I suppose to be 
positive-definite and non-singular. Then the expression on the right in (34) is derivable 
immediately because the integral is that of a multivariate normal population and is thus 
equal to (27r) i "*a i . Hence /a ? \ 1 

(35) 


so that 


\ I'ati /*««• 

~^Ti da ki---j da ut> a- 


This is not in general a very easy way of deriving I but once I is known it gives a fairly simple 
method of deriving the integral of terms of type ijdij*... ijp exp [ - i-Sa l} t t t } ]. Partial differ¬ 
entiation of (32) according to appropriate gives the powers in t. 

12. Another useful device can be used for the terms in the fourth-order cumulants. 
We have 

+ /?(< 2 +< 4 ) a + ail + 2p< 4 + pt\ + at\ + 2p£ 3 i 4 + /?<|}]. (37) 

Then the term in /c 40 is the integral of the exponential term multiplied by a term 

(h+h )*+<}+ t\ - £{(*, +t s ) z +ti+ if} 2 , 

and hence can be derived by twice differentiating M(pj*J(<x{l)) with respect to a. Similarly, 
the term in k 31 is given by differentiation with respect to a and p. The term in k 22 is given by 
a differentiation of the form 

da dfi dp 2 ' 

13. I omit the detailed working and give the results, Writing 

A = 2^2 f 1 sirV +P 2 (1 - P 2 )' 1 

- (Ip - ip 3 ) (i-ip 2 ) - * sin- 1 ip-MWp 2 )- 1 }, (38) 
B = - 6^2 W 1 sin-V +p(l-p 2 )" 1 -1(1 - ip 2 )-i sin-1 y _ j p(1 _ jp.j-i), (39) 

0 “ - ^ 2 )' isin_1 p +( 2 + p 2 ) (i - ?v 

- iP(l - ipT* sin- 1 \p -1(2 + p 2 ) (1 - ip 2 )}, (40) 

1 P 2 (3 -p 2 ) 

1 Srr 2 (1 — p 2 ) 2 ’ 

_ 1 1 + p 2 

3tt 2 (1 -p 2 ) 2 ’ (43) 


977 2 (l-p 2 ) 2 ’ 


(44) 



M. G. Kendall 


183 


1 ( 9 + 48p 2 — 16p 4 _ 3 p _ (|-p > 2 _ P 2 (3~P 2 ) 

rl — „2\2 lft/1 2" nr>/i Ow, 1 H»T 


36(1—p 2 ) 2 16(l-ip 2 )«"‘“ 2P 12(1 — p 2 ) (1 - ip 2 ) 2 ^ 24(1 -p 2 ) (1 -Ip 2 ) 3 / ’ 

(45) 

1 f - 20 p 1 ( 2 +p 2 ) p(l+p 2 ) 

^|(1-P 2 ) 2 8 U-iV a ) 8 24(1-ip 2 ) 2 (1-p 2 ) 

2 P 


4; 


, P (1 +P “) 2 I 

I i a/l 1 ^9.\5). /I \ / 


3(1 —p 2 ) (1 — ?P 2 ) 12(l-ip 2 ) 2 (l-p 2 ) 2 
The additional terms to be added to that of (31) are then £, say. where 
£ = d(/c 40 -F /f 0 .j) 4 4 k 13 ) + (7k 22 4- D{k | q 4 * 2 S ) 4 (* 30*21 4 * 12 * 03 ) 

4 F(k 3Q k 12 4 K 2a * 03 ) + 0*30 *03 + # (*li 4 *? a ) 4./* al * 12 . (47 ) 

We then have 
vartf = E{f-) - {E{t)Y 

= i) -[ 1 -( 2 ^-3){g(l)} a + 2(n-2)((^sin- 1 p) -(“ sin_ Hp) + ?Kj • ( 48 ) 


Effect of non-normality on r g 
The relation between r s and p 

14. If we write 

then Spearman’s p s may be defined as 


n n 

«i = S%, (»i=S 6 y, 

j'-l 1=1 


P 3 = 


We then have 

ia(a 2 -l)E(r s ) = E(Sa i 6 i ) 

= -%A) 


j—1 _ 

i?i(n 2 — 1 )’ 


(49) 


(50) 


Now 

and 


= i?(Sa y £ 6 ft ) 

= m(»-l)i?(ay 6 <# ) + w(n-l)(w- 2 ) J 0 (a# 6 tfc ) (j^k). 

Eidyby) = i?(l) = “Sin _1 p, 

E(a 12 b u ) = ~ (" ~ f ^{eiViUi-JaHiWtfi-i/a)} 

7T J —OO^hl J — 00 

= -4f grf ^rexp[-i(f|4pfi« 2 4il)] 

7 T 2 ]-< 0 lt 1 J-colt-i 


(51) 


Substituting in (51) we find 
E(r s 


= - sin - 1 ip. 

TT 


6 


7i(n4 1) 


{sin -1 p 4 (n — 2 ) sin -1 -Ip} 


(52) 


(53) 


as given in equation ( 3 ). 



184 Marik and product-moment correlation 

15. Consider now the modifications necessary to allow for non-normality. The cumulative 
function of x 1 ~ and y x — y ; , is now, as far as fourth-order cumulants, 

- (i\ + Ph h + © + ter (*i ) 3 + ~ («A ) 2 (*© + ^ ^ 


+ y (**)*■(*i ) 4 + x K ) 3 (*© + -f («x)* (*©* 

+(»#i) (<■«■)*+ §W) ~ k*(‘Vi) a - kon(^) 3 

+ ^K) 4 + ^(il 2 )h 

A few terms in this cancel. We may neglect those which give odd orders in the total power of 
the 1-terms. We find then for the c.f. (reducing the scale by a/ 2 in the usual way) 

exp [ - \(t\ + pt-L t 2 +©] {1 -f + ~T71\ U +~tt t\ t\ + t l 1\ + yr7rd —rritai t\ h + K u h 4) 2 1 ■ 


On integrating over (dtJUJ (dl a /il 4 ) in the usual way, we find 
= ^~jy{Biu- 1 p+ (ii - 2) sin 1 ip} 

+ + k m) + M K 3i + ^ 13 ) + Afx 22 + N(kI 4- + PK li K il }, (55) 

where K » ~[{3(fe>) - W) (1 ~ (56) 


i = - S (1 -R ! , (57) 

M = ^(l-kp 2 )~ l > (58) 

N = (59) 

P= iir {1+W}{1_ip2} ' i - (60) 

16. It is instructive to consider how far tho limit of equation (55) for large n can be 
derived by an extension of K. Pearson’s original method. A comparison of the two approaches 
well illustrates the advantages of the use of the characteristic function in the bivariate Gram- 
Charlier expansion. 

If At*) and/ a (y) are the border-frequency distributions, the grades £, and 7) are defined by 

£ = A{x)dx, My) dy. (61) 

J -CO J -CO 

Each has a mean of ^ and a variance of -j~V • if or the grade correlation we then have 

Po = 12 f f 0,yfdxdy— 3, (62) 

J — OO J — CO 

where/is the frequency function/(.r,?/) and hence 





M. G. Kendall 


185 


Xii virtue of equation (12) this becomes, after partial integrations with respect to x and y, 


3 % 

dp 



9|_3 V 

dxdy 


-fdxdy 




(63) 


This result is generally true when p is the first-product moment x n entering explicitly into 
the frequency function. Pearson substituted for f lt / 2 and / in (63) in terms of the normal 
frequency functions and integrated to obtain 


tyg _ 


3 


(64) 


dp 7r( 1 — ■}/>*)*’ 

whence equation (4) follows. 

Now with tire notation of § 3, but writing ji for a.(y) we have to our usual order of approxi 
mation: 


A = (i- K fi>l+ K £DH^l)P, 


/ = |l - J D> -If D\D 2 -^flJM -3pi + g-zn 

+ K -fD\D\ + K -fD,D\ + + ^( K f f\ +^fl\ D\ + ^L/l|) 2 ) eh. 


When we substitute in (63) we obtain for the integrand a series of products of derivatives of 
a, f) and ffi. By partial integrations we can convert these into products of 0 and partial deri¬ 
vatives of a and /?. It is then found that terms involving /c 30 and /c 03 vanish. We are left with 


K-, 


d>{ 1 + -§D\ D, + + f D x l)\ 


+^n+f-D]Di+ 


Kil A'l 


-D\D\^l)\D^ap, ( 66 ) 


These expressions can be evaluated term by term. A comparison of (65) with (64) shows how 
the various terms correspond. The numerical coefficients in the latter are J of those in the 
former for k’s of the fourth order and £ for k ’s of the sixtli order because the exponential in 
the integrand in (65) is on a scale ^2 times that of (5 4). 


The sampling variance ok r„ 


17. To find the variance of r„ we have to evaluate £!(■/’$), and here a novel point appears. 
We have 

(kn(n*-\)y E(rt) = P* 


t j k 

= Ml) (* + !)■ 


(66) 



186 


Rank and product-moment correlation 


One terra contributing to this sura will be E(a {j b ik a lm b l:j ), where the suffixes with different 
letters are not equal. The c.f. of a typical term is the mean value of 

exp [it k [x x - x 2 ) + it 2 {y 1 - y a ) + it 3 ( x i -x s )+it 4 {y 4 - y t )\, 
and is exp [- \{t\ + if + <| + /f + pt x t % + pt x t 4 + (67) 

The expectation of the term is therefore proportional to 

1 C 00 rlt F 00 rlt 

i exp pt x t 2 +pt 1 t i +pt^)]. 

7T J — co J — co £4 

If we differentiate twice with respect to p, one part of the resultant integral will be 

ri\f ^ 1- "J exp [ - J(if + 1\ ++ 1\ + pt x t 2 +pt x t 4 + pt 3 t 4 )] = ^- 2 ^_ |^2 + 

Now this cannot arise from the differentiation of an 1 ordinary’ mathematical function. The 
first integral, in fact, is elliptic. It appears, therefore, that var r s and higher moments of 
r a will depend on non-elementary transcendental functions. 

18. Consider the terms contributing to (66). Remembering the symmetry of some of 
the expressions in a and b and with the convention that all suffixes with different letters 
are different, we have the following types as far as terms of order n 5 : 


Type 

a ij b ik a lm b lp 

a u b ik a lm b lp 
a ijbn a Jm b lp 
b ik 11 hn b lm 
a ijbik a lm b lk 
®i}bik®im by 


Number 

n(n- 1) (w-2) (w-3) (w — 4) = 71 ® 
n® 

2n® 

2 u® 

2 n® 

2 71 ® 

2ti® 


To order w 5 the number of terms sums to ti 6 - 4m 5 which is correct because the total number 
of terms is n 2 (n~ l) 4 . We then find, to order tt 1 , 
var r a = E{r*) - {E(r s )} 2 

= W n ° ~ 16w5 ) E ( a n b ik a im b i P ) + ^ E K b ik a im b ip) + ^E(a, n b ik a lm b [p ) 

+ 2w 6 E(a {j b a a lm b lp ) + 27i 5 R(a l7 6 ifc a iCT 6, m ) + 2n i E{a iJ b ik a lm b tk ) + 2n 6 E{a ij b ik a lm b li )} 
-n\n~lf{E{a ij b.^ + (»- 2) E{a ij b ik )f] 

~ ~ [ _ ^ifc )} 2 + E { a ij b ik a im b iv) + fltj m b[p) 

IV 

+ 2 E{a, ij b ik a lm b lk ) + 2E{a ij b ik a lm b lj )l (69) 

since = E ( a ii b u a im b i } >)> 

in virtue of the symmetry between a and b. 

The expectations may now be found as a power series in p. For example, to find 
E{a i jh a Ob lm b lp ), we have, for a typical term a 13 6 12 a 24 f) 25 , a c.f. (reduced to scale) of 

exp { — + £l + 1\ + t 4 -\- pt 1 t 2 —pt 2 1 3 — t 2 t 4 +pt 2 t 4 )}. 

If the exponential in p is expanded we obtain integrals of type 
dt x 


[-W 


-CO Vl 


-•jr> 


h ~hh + hh) B eX P { - Wl + t\ + <1 + t\ ~ <2 t 4 )j, 



M. G. Kendall 


187 


which can be evaluated term by term. I find, as far as p s , 


= 1- 


2/|0\* 23 (p\\ 26 Ip 


3 212 


+rfc + 


13512/ 10512 


Similarly, 




7 t) I 36 3\2 




, , . /2\ 2 rrr 2 2/p\ 2 50 /p\ 4 784 /p\« 76545/p\ s " 

E( a ijb ik a im ip ) - |^ 3e + 3 ^ 2 j ■+" 81 1215\2/ + 55112 ( 2 ) J’ 

, , , /2\ a /ff/M 2 4 /p\ 4 46 (pY 52 (p\H 

E(a {j b ik a lm b lk ) = y ^ 3 -^ 2 ) + 9 ( 2 ) + 135 ( 2 ) + 705 ( 2 )]’ 

^w-(DW + S(M© ,+ S(^ ; 

- 6 si "-' y)' - (l)T(t)‘-4 ®‘ + is® v 4(1)1 


- (j™- 1 - id lid gis) + m j- (,i) 

On substituting in (69) we find 

var r s = - (1 - 1-563,465p 2 + 0-304,743p 4 + 0-155,286p Q + 0-081,437p 8 ). (75) 

7b 

19. Now K. Pearson, in his discussion of grade correlations, gives a result which is 
equivalent to 

var r, - -(1 - 1-666,5507p 2 + 0-433,6130p 4 + 0-161,S337p° + 0-049,5042p 8 ). (76) 

71 

Without having given the matter much thought I had expected that (75) and (76) would 
agree, just as (3) and (53) agree in the limit. A check of my own formula having failed to reveal 
any error, I re-examined Pearson’s, with some rather interesting results. Pearson begins 
with the general large-sample formula for the variance of a product-moment correlation 

varr — ^ 2 (^ 22 l ^ ^ aa ] ^ 4U 1 ^ 31 ^ 13 | (77) 

n l/ £ ii 2 /%/<q 2 4//| 0 4//| a fnPto /hx/'oJ 

Reducing this in virtue of certain symmetries and putting 

/Go = t l 02 = tV> /Go = /Gr = 8b> I 1 11 = 


we have 




Pearson then evaluates // 22 and p 31 by some characteristically pertinacious mathematics and 
arrives at (76). I cannot find any material error in his work.* 

But it seems to me that the use of equation (77) is itself an error. The large-sample formula 
is derived by writing 

r = Hi 1 _ ( 7 m 

(>»«»«)'’ 

dr _ dm n 1 dm 20 1 dm a2 Q 

r m u 2 m 20 2 ?n 02 ’ 

and proceeding in the usual way by squaring, summing and substituting for the various 
product-moments in terms of known parent-values. (The formula is derived as an example 
in my Advanced Theory , vol. 1 , p. 211.) In short, the formula allows for sampling variation 

* There is a slight arithmetical error in the coefficient of p 5 in the expansion of 2 p„, which is given by 
Pearson as 0-009,2660, whereas the correct value is 0-008,9525; but this is quite unimportant. 



188 


Ranh and product-moment correlation 


var r = 


(81) 


in the variances m m and m 02 , whereas in the calculation of grade correlations the sample 
variances are always constant and equal to ^ ~ !)■ If and m 02 are constant then the 
variance of r as given by (79) is simply 

1 /^gg Ph 
n /iao/^Ui! 

and a comparison with (78) shows the importance of the difference. 

20. Even (81) does not agree with my equation (75). The reason is that the large-sample 
theory does not aljow for the appearance of expectations of certain terms with tied suffixes. 
But it is interesting to observe that the moment /i n does appear as one of the expectations 
in the exact result. 

In fact, for a grade £ we have, from the Fourier inversion formula, 

1 - e~ iJcl 


* 2tt j _ 0 


it 


-<p(t) dt, 


(82) 


where (j> is the c.f. of the normal function cl(x). In our convention as to the limiting value of 
the integral at zero we may write this as 


) dt. 


Thus 


i 

(2tt)\ 


— oo 


‘CO 

-^exp (- ityX - it 2 x - it 3 y - it^j) 0(< x ) <f>(t 2 ) </>(t 2 ) ^(f 4 ) dffi. 

— 03 


The integral with respect to x and y is the c.f, of (t 1 + f 2 ) and (t 3 + f 4 ) and hence is 
ex P [ “ K(<i + k) 2 + (h + L) 2 + ty(h + 4) (4 + ^)}]- 


Hence 


/hi 


1 


dt t 

(27r) 4 J _qo ZjSj ■' 
= ji S E(a i jb ik a {J b i 
so that 

Likewise 

so that in (81) 


Cffl (It 

77 ex P [ - &(<i + h) 2 + ih + h) 2 + 2 P(L + <a) (<s + < 4 ) + <1 + <! + <! + <!}] 

— 00 


/hi 

/ l lo 


= 9 E(ai}bik a iibi m )- 


A 11 

/ 6 lo 


9{F?(a y 6 iA )} 2 , 


varr = -[ L’(a ij ./i 1 . fc a. 0 b ; J-{^(« ij .b (fc )} 2 ]. 


(83) 


(84) 


Now if we compare this with (69) the matter becomes clear. In the latter there are certain 
tied suffixes, and if they did not appear, e.g. if we had E (a. y b ik a lm b lp ) instead of E(a a b ik a lm b lp ), 
the total coefficient of {E(ayb ik )} 2 would be — 9 + 4 + 2 + 2 = — 1, and the agreement would 
he complete. The more exact method distinguishes classes of case which the large-sample 
formula treats as the same. 


Estimation of the first product-moment 
21. The relative insensitivity of E{t) to departure of the parent population from normality 
raises the question whether the statistic sin \irt would provide a good estimator of the first 
parent product-moment even in non-normal cases. E{r s ) is more sensitive, but here again the 



M. G. Kendall 


189 


statistic 2 sin \m- s is a possible estimator. Neither, of course, will be as good as the actual 
sample product-moment r where such a statistic can be computed; but there is a class of 
case, intermediate between the ordinary bivariate frequency table and the contingency 
table, in which t and r s can be found but r cannot, namely, the case in which the rows and 
columns are arranged in some order although ranges of a variate are not assigned to them. 


The 2x2 table 

22. It has been pointed out by Whitfield (1947) that when due regard is paid to ties the 
coefficient r for a double dichotomy is the same as *](x 2 l n )- Writing the 2x2 table in the form 


a 

b 

a + b 

c 

d 

c + d 

a + c 

b + d 

% 


we have 


ad —he 


{(a. + 6) (c 4- d) (a + c) (b + d )}l 


(85) 


( 86 ) 


This expression, has one very interesting application. Suppose that a distribution is normal 
and is dichotomized at its medians so as to give 


a 

\ n-a 


in-a 

a 

hi 

\n 

\n 

n 


Then 


r = - 


-(kn-af 

(in) 8 


-£-l. 

n 


(87) 


( 88 ) 


Now the distribution is antisymmetric about the medians of the two variates. Consequently 
any loss of scoring in the calculation of r due to grouping is zero, because any loss due, say, 
to grouping in the top left-hand cell is offset by an equal and opposite gain in respect of the 
bottom right-hand cell. Hence (88) is exact in the sense that this value would be arrived at 
for median dichotomy even if there were no grouping. Hence, since E(t) = t in the limiting 
case, we have exactly 

p = sin^7r/~— 1 j = cos27r^-. (89) 


This is Sheppard’s theorem (1898) on median dichotomy. 

23. If we calculate p s for a 2 x 2 table we arrive at the same value as for t, with due allow¬ 
ance for ties in both cases. This is evident from the consideration that in rankings of two 
p s and t are equivalent, and it may also be verified directly from (85). In fact, the denominator 
entering into p s is (Kendall, 1948, p. 29) the product of two factors, one of w'hich is the 
square root of 

4(» 3 — n)- 4{(a-|-6) 3 -(a+6) + (c + d) 3 -(c + d)} = %n(a + b){c + d), 

and the other the square root of \n(a-\-c)(b + d). The numerator is \n{ad-bc ), and hence 
p s is also given by (86). 



190 


Rank and product-ntomwit correlation 

24. For the 2 x 2 table we then have the choice of the two estimators sin \nt and 2 sin \wr s . 
They are not in general equal, and, in fact, for small t one is about 50 % greater than the 
other. This is a little disconcerting until we remember that p s for a ranking of two is a very 
rough measure of relationship and may be seriously affected by grouping,* whereas t is not 
much affected when the dichotomy is near the median and relates to a normal population. 
It thus appears rather unsafe to use either estimator if the dichotomy is far removed from the 
median; and in the contrary case sin $nt is probably better than 2 sin \vt. 

Example 1. Pretorius (1930) gives a number of distributions varying from the markedly 
skew to the nearly normal, together with their moments up to and including those of the 
fourth order. 

Consider first of all the corrective factor in equation (25). Using Pretorius’s values, I find 
for this factor the following values (correlation data from his Tables I, II, III and VI, 
respectively): 



Parent p 

Corrective factor 
for non-normality 

Marriages 

0-7082 

-0-0597 

Parents 

0-7349 

0-0463 

Barometric heights (full year) 

0-5807 

0-0129 

Beans 

0-7811 

— 0*0095 

_ 


The corrections are reasonably small. 

I then grouped the distributions so as to give dichotomy as near the median as possible, 
finding 


Marriages Parents 


87,325 

68,019 

145,344 

276,553 

53,948 

330,501 

34,793 

121,648 

159,441 

75,811 

225,370 

301,181 

122,118 

179,667 

301,785 

352,364 

279,318 

631,682 


Barometric heights Beans 


10,384 

4,251 

14,635 

4,161 

617 

4,678 

4,250 

9,970 

14,220 

1,680 

3,082 

4,762 

14,634 

14,221 

28,855 

5,841 

3,599 

9,440 


* If we put n = 2 in (03) we get E{r s ) = - sin -1 p, which agrees with the result for t. For small n the 

7 T 

formula 2 sin is badly biased and this affects the calculation for a 2 x 2 table. 







M. G. Kendall 


191 


As the correlations in all four cases are high I took two further examples quoted in my 
Advanced Theory, vol. i: (p. 27) Tocher’s data for cows (r = 0-2189) and (p. 324) Koga and 
Morant’s data for highest audible pitch (r = — 0-6136). The resulting tables were: 



Cows 


1,407 

1,078 

2,485 

881 

1,546 

2,427 

2,288 

2,624 

4,912 


Audible pitch 


809 

1,383 

2,192 

769 

388 

1,187 

1,608 , 

1,771 

3,379 


I then find for these six tables: 


Distribution 

Product-moment 

correlation 

sin JTO 

2 sin ini 

Tetrachoric r 

Marriages 

0-7082 

0-5688 


1 

Parents 

0-7349 

0-7982 

0-6064 


Barometric heights 

0-5807 


0-4268 

0-65 

Beans 

0-7811 


0-5609 


Cows 

0-2189 

0-3145 

■ 

1 

Audible pitch 

-0-6136 

-0-4408 


-l 


The agreement between r and sin \rrt is no more than fair, and that between r and 2 sin \rrt 
is much worse. The distribution of marriages is very leptolrurtic, but this by itself should 
not account for the poor correspondence in that case. The main reason, I think, is the skew¬ 
ness of the distribution which, though not affecting the correction for non-normality, brings 
about a sweeping amalgamation of rows and columns on one side of the median but not on 
the other, so that the effect of grouping is substantial and one-sided. I give also some approxi¬ 
mate values of tetrachoric r, which are so bad that more refined calculation is not worth 
while. Our methods are at least a great improvement on tetrachorics, which seem to be 
extremely sensitive to departure from normality. 

mxn tables 

25. In the more general case of anfflxn table (to or n greater than 2) we might expect 
better results, less information being lost by grouping. The calculation of t can be extended 
to such cases without much difficulty. (That of r s approximates to the calculation of r itself 
for grouped bivariate data and need not be separately considered.) The method is best 
illustrated by an example. 

Example 2„ The table on p. 192 shows a 4 x 4 grouping of the bean data referred to in the 
previous example. 

No score results from the number 332 in conjunction with any members in the same row 
and column, but a positive score results from all the other cells. Thus, for instance, the score 
resulting from the cell in the second row and column is 332 x 5,550. The number in the first 


















192 


Rank and product-moment correlation 


row and second column, 121, gives a positive score with members lying to the right and 
a negative score with those to the left; and so on. The total score is 

332(5550 4- 126 + 0+ 1420 + 642 + 30+1+53 + 32) 

+ 121( — 1128+ 126 + 0-5 + 642 + 30-0 + 53 + 32) + 0 + 0 
+ 1128(1420 +642 + 30+1+53 +32) + 5550(-5+642 +30-0 + 53+ 32) 

+ 126{-6-1420 +30-0-1+ 32)+ 0 + 5(1+ 53 +32) 

+ 1420( - 0 + 53 + 32) + 642( - 0- 1 + 32) + 30( - 0 - 1 - 53) = 9,175,210. 


332 

121 

i 

0 

' 

0 

453 

1,128 

5,550 

1 

126 

0 

6,804 

5 

1,420 

642 

30 

2,097 

, 

0 

l 

53 

32 

86 

1,405 

7,002 

821 

62 

9,440 


The divisor is given by the square root of the product of two factors derived from row and 
column totals. 

U = |(9440 2 ~ 453 2 — 6804 2 — 2097 2 — 86 2 ) = 10,104,585, 

V = ^ (0440 2 — 1465 a — 7 092 2 — 821 2 — 62 a ) = 17,996,513. 


Hence 


9,175,210 

^(19,104,585 x 17,996,513) 


0-49483, 


sin \nt = 0-7013 (against r = 0-7811). 


The estimate in this case is worse than that given by the 2x2 table, but we might have 
expected this result from the nature of the grouping. 

The arithmetic of determining the score may be systematized. Talcing the above table as 
an illustration, we form for each row the sum of the members lying to the right of a particular 
cell less those lying to the left, e.g. 


121 

-332 

-453 

-453 

5676 

- 1002 

-6678 

-6804 

2092 

667 

-1395 

-2067 

86 

85 

31 

-54 


The process is repeated for this table by operating on the columns, giving 


7854 

-250 

-8042 

-8925 

2057 

1084 

-911 

-1668 

-5711 

1419 

7162 

7203 

-7889 

667 

8626 

9324 


The score S is then one-half of the sum of the products of these numbers by the corresponding 
cell frequencies in the original table, i.e. 

28 = (332 x 7854) + (121 x - 250) + etc. + (32 x 9324) = 18,350,420. 



M. G. Kendall 


193 


As a matter of interest I worked out the value of t for the full 16x12 table of the original 
bean data, though this involves more arithmetic than one would want to spend on such work 
in practice, and found t = 0-6243, sin \irt — 0-8309 against r = 0-7811. The 4x4 table used 
earlier in this example is a condensation of the original table obtained by amalgamating 
consecutive sets of fours by columns and consecutive sets of threes by rows. 

26. To sum up, it appears that the use of equation (1) may be moderately reliable for 
grouped data but is not very accurate as providing an estimator of the first product-moment. 
A good deal depends on the nature of the grouping. It is possible that further research may 
provide grouping corrections which will improve the estimator, or that rules may be dictated 
by theoretical considerations which govern the optimum grouping for ranked data to permit 
of the estimation of product-moments. 

I am indebted to Mr. A. K. Gayen, who read this paper in typescript, for calling 
my attention to the facts that the order of magnitude of the coefficients in the expansions 
of (6) and (7) has been considered from the point of view of elementary errors by C. E. 
Quensel (1938, Lunds Univ. Arsslcr. N.P. 34, 4,1) and that E. C. Rhodes (1925, Biometrika , 
17, 318) investigated the effect of non-normality on K. Pearson’s formula—equation (4)— 
though his results appear to be inexact owing to the retention in the final formula of 
terms of the same order as some which have previously been neglected. 


REFERENCES 

Kendall, M. G. (1941). The Advanced Theory of Statistics, l- (Fourth edition, 1948.) London: Charles 
Griffin and Co. 

Kendall, M. G. (1948). Rank Correlation Methods. London: Charles Griffin and Co. 

Mohan, P. A. P. (1948). Rank correlation and product-moment correlation. Biometrika, 35, 203. 

Pearson, K. (1907). On further methods of determining correlation. Drapers Co. Res. Mem. 
Cambridge University Press. 

Pretorius, S. J. (1930). Skew bivariate frequency surfaces, examined in the light of numerical 
illustrations. Biometrika, 22, 109. 

Sheppard, W. F. (1898). On the application of the theory of error to cases of normal distribution 
and normal correlation. Philoc. Trans. A, 192, 101. 

Whitfield, J. W, (1947). Rank correlation between two variables, one of which is ranked, the other 
dichotomous. Biometrika , 34, 292. 


Biometrika 36 


13 



L 194 ] 


TESTS OE SIGNIFICANCE IN HARMONIC ANALYSIS 
By H. 0. HARTLEY 
1. Introduction 

The classical harmonic analysis and closely related periodogram analysis have, in recent 
years, been the subject of severe criticisms (Kendall, 1046 a, b). These have been mainly 
of two kinds: 

(а) That the analysis has been widely misused in situations where it is not appropriate 
and has thereby led to faulty conclusions. 

(б) That the tests of significance used are based on the assumption of a random series and 
are therefore hardly ever applicable. 

In this paper we do not wish to deal with (a) except to emphasize that it is the misuse of 
a good tool that should be criticized. Harmonic and periodogram analysis have, of course, 
been useful in their definite but restricted fields of applications. When a reasonable theory 
suggests that the systematic component of the data is composed of a moderate number of 
sinusoidal terms, such a,n analysis is appropriate. Asexamples, we may mention here investiga¬ 
tions into instrumental error and resonance behaviour, analysis of sound tracks and of non¬ 
centralities in the surface of circular machine parts, numerous problems in astronomy and 
meteorology and many others. On the other hand, it would be inappropriate to use periodo¬ 
gram analysis when determining the period of what is suspected to be a heavily damped 
vibration, simply because the dominant harmonics in the Fourier expansion of such a vibra¬ 
tion bear no relation to its frequency and the higher harmonics required for its representation 
will not become apparent. The procedure is just as inappropriate as would be an attempt to 
obtain, say, a quadratic regression by harmonic anatysis. 

It is equally inappropriate to use periodogram analysis in a search for ‘periods’ vaguely 
defined by verbal descriptions, usually of a kind to suggest that a ‘period’ is a time interval 
for which the serial correlation is high. It is not surprising that ‘ periods ’ so defined are best 
determined, if they must be, by the correlogram. Or, again, if we suspect the series to be 
generated by an autoregressive scheme, harmonics are unsuited for the estimation of its 
parameters. 

However, in this paper we want to deal with (6), and here we should quote Kendall’s 
(1946 a) summary of his discussion of the Schuster, Walker and Fisher tests: 

‘All the tests we have described are based on random normal variation in the original 
series; but in practice nobody would embark on the labour of a periodogram analysis unless 
he had satisfied himself that the data were not random. It seems to me therefore that these 
tests are really off the main point, being tests based on a hypothesis which we have already 
rejected. They are not without their usefulness, however. We may assume with some con¬ 
fidence that if a particular intensity in the series is not shown as significant on the hypothesis of 
random variation, it is not significant when the series is systematic. What does not follow is 
that if one intensity is significant, then others must be so even if they exceed the significance 
values; for they are not independent of the significant value, at least for short series. What we 
ought to do perhaps is to extract the component which is considered significant from the 



H. 0. Hartley 


195 


series and then analyse the remainder; and so on as long as significant terms appear. But this 
is hardly a practical computational possibility. Tests of significance in the periodogram, as 
in the correlogram, remain undiscovered.’ 

Kendall’s main point, therefore, is that the 1 significance’ of an observed intensity may be 
‘misleading’ in that it does not necessarily indicate the reality of the period for which it 
was observed but, possibly, that of other periods or, indeed, quite a different non-random 
behaviour of the series. 

Whilst it would appear to be diffi cult to develop a test entirely free from the above criticisms, 
we will show that most of the difficulties can be overcome by using tests in which the periodo¬ 
gram intensities are independent or slightly correlated, and by applying some recent results 
in analysis of variance test distributions. 

We shall fix the ideas by developing tests first under very restrictive assumptions, but will 
proceed step by step to'more general hypotheses. 


2. Test eor the maximum harmonic intensity 


Let y t (t = 0,1,..., n~ 1) be a series of n observations at equidistant intervals of the in¬ 
dependent variable x. Without loss of generality we may assume that y t was observed at 
x t - 2-ntjn. Harmonic analysis of this series consists in the fitting of, say, 2m 4-1 regression 
coefficients ana a ■ h h h 

in the form of the regression function 

m 

T)=a 0 + 21 a i cosily + b t sin ity, (1) 

i—1 

where y = 2 njn. 

It follows from standard regression technique and from the exact orthogonality of the 
trigonometric functions that 


| n -1 2 n_1 2 n ~ 1 

«o = - S Vt = y, = - £ y,wsity, b t = - £ y, sin ity, 
«i=o n (=o ft (=o 


( 2 ) 


from which we compute the ‘observed intensities’ S'f = a\ + b\ corresponding to period i. 
Moreover, if m ^ \{n — 1) the residual sum of squares is given by 

n—1 n—1 m 

R l = S ( y,-7 ,) 2 = S (z//-z7) 2 -i« S (<*?+*?)■ (3) 

d~0 d=0 i= 1 

In order to test the significance of the harmonic term with the largest intensity (S? n(lx ,, 
we start from the hypothesis of a completely random series, say m H 0 , i.e. we assume 
m H 0 \ y, = e t , where the e ( are independent normal deviates with variance cr 2 . 

Under this hypothesis, the m intensities \nS\ (i = 1,2, are all independent y 2 

values, each based on two degrees of freedom. The chance for the largest intensity to exceed 
a given level x, say, is therefore given by 

1 — (1 — (4) 

which is Walker’s (1914) criterion. Fisher (1929) has obtained an ingenious exact test for 
the casem = \(n— 1), n odd. 

Here we prefer to convert Walker’s criterion into an exact test by making use of the residual 
(3) as an independent estimate of cr 2 and then to ‘studentize’ (Hartley, 1944) 'Walker’s 
criterion. The resulting test is one for the maximum variance ratio 

-Fmax. = i>oS ,2 max .(ft~2m-l)/I? 2 . 


(5) 



196 Tests of significance in harmonic analysis 

To find the probability integral of F max , let ffs) denote the distribution of a sample standard 
deviation based on v degrees of freedom; then the chance of F m ax , < F : ' : is given by 

P(F*) = f>„{s)(l-exv{-s 2 F*}) m ds, (6) 

where v = n — 2m- 1. 

This integral was evaluated by Finney (1941), who found 

m / 2 rF*\-* v 

P(F*) = 2(-l) rTO C' l+—- . (7) 

r=0 \ v / 

Earlier the present writer (Hartley, 1938) suggested a method of approximating to the 
integral, resulting here in the simpler formula 

P(F*)±{l-(l + 2F*lv)-i‘’} m , (8) 

which is discussed in detail by Finney. A further approximation which we use in the examples 
of §7 is as follows: 

Instead of evaluating the upper 100a % point of the distribution (7), use the upper 
100 a/m % point of the F distribution based on 2 and v degrees of freedom. This approximation 
is, of course, only valid for upper percentage points. 


3. The power of the F max test 

In order to deal with Kendall’s objections against the use of criteria based on random series 
y h we must investigate the behaviour of the test under alternate hypotheses. This is 
best linked with an examination of its power function; the appropriate set of alternate 
hypotheses is as follows: 

m Hj-. The series y t is composed of a systematic harmonic series Y t and a random remainder 
e t , i.e. we have m 

Vi = Y t + e t = A 0 + S (A* cos iiy + P* sin iiy) + e t . 


where vare, = a 2 . Of the m periods i = 1, some (say l) have positive amplitudes and 
the remaining m -1 have zero amplitudes. More precisely 

A\ + B\> 0 for (i) in (k), 

where (k) is the subset of positive amplitudes in the total set of m periods; 


A\+B\ = 0 for (i) in (A), 

where (h) is the set of m-l ‘zero amplitude-periods’ complementary to (k). 

Under this hypothesis, R 2 is again distributed as y z with v — n - 2m - 1 degrees of freedom, 
the intensities \nS\ are again independent y 2 values with 2 degrees of freedom, whilst the 
intensities \nS\ are independent non-central y 2 values (e.g, Patnaik, 1949) with non¬ 
centralities given by = + 

Patnaik has shown that the distribution of a non-central y 2 having a non-centrality A 
and based on n' degrees of freedom is, to a close degree of approximation, given by the 
distribution of py®, where y 2 . is a central y 2 based on v' degrees of freedom given by 

(n* + A) 2 
n' + 2A ’ 
n’ + 2A 


v = 


(9) 


P = 


n' + A 


whilst the scale factor p is given by 


( 10 ) 



H. 0. Hartley 


197 


To a close approximation, therefore, the probability, say p k (x), for \nS\ to be below a given 
level x is equal to the probability of y 2 based on v k degrees of freedom being smaller than 
xjp kl i.e. 

Pk( x ) = 

where, from (9) and (10), 


v fc ) I e _ 4 T t**'* - 1 dr, 


(11) 


J'fc - + j^2 + ~^[Al +Bl)j, 

Pk ~ + ^(^1 + Blijj (% + 2^2 (^l+Bfclj- 


( 12 ) 


It will be noted that v k and p k depend on the non-centralities {A\ + B|)/2cr 2 , and these para¬ 
meters must be known in order to evaluate the power of the test. 

The chance for all the m values of \nS\ to be smaller than x is then given by 

£(aO = (l-e~**r-'II ?*(*)• (13) 

k 

We now denote the probability for V-^max. = \^n8 j ^v)R to fall below a given level 
say) by P(*JF*). This integral is then given by the ‘studentized’ expression of (13). 
Usingp(£) =p(2£ 2 ), we have for the first two terms of the ‘studentized’ integral (Hartley, 
1944) , 

P(jF*)=p[jF*)--(JF*p’-F*p"), ( 14 ) 


where p' and p" are the first two derivatives of p. The power of the T max , test can therefore 
be computed from (14) as 1 - P. This is the chance of establishing significance, using the F m&x , 
test, when the series is composed as defined by (p. 196). Its evaluation for specified 
values of the (A 2 k + B k )jcr 2 , although laborious, is quite feasible. Numerical examples are 
given in §7. 


4. The chance that the F mixXi test is misleading 

We now use the above expression of the power function to examine the frequency with which 
a significant result might be ‘ misleading ’ in the sense of the above criticism made by Kendall. 
Strictly speaking, if the maximum observed intensity \nS 2 is returned as significant by the 
test we should, of course, only conclude that m H 0 is not true and that some such hypothesis 
as m Hj is true. In practice, however, we wish to conclude more specifically that A] + B 2 > 0 
for the particular j for which the maximum intensity was observed. The total power, i.e. the 
chance of reaching a significant result when is true, is therefore the sum of the chances 

of two situations: 

(i) When the observed maximum £ nS 2 does, in fact, come from the set of positive inten¬ 
sities ( k ) (i.e. j is in (&)). 

(ii) When the observed maximum \nS) does, in fact, come from the set ( h ) of true zero 
intensities (i.e. j is in (A)). 

The frequency, therefore, of being ‘misled’ by a significant result of the test is given by 
the second part (ii) of the total power. Below we shall show that this frequency is, in fact, 
very small. We shall prove that this chance is smaller than (m — l)lm times the error of the 
first kind, i.e. smaller than a(m — l)lm, if the F m &x. test is carried out at the 100a % level of 
significance. 



198 


Tests of significance in harmonic analysis 


Differentiating (13) and integrating back over the range x^ <0 o, we note that 1 _ fi( x \ 
can he written as ‘ Je\ ) 

!-£(*)= f £ p' r (£) n p (g) x (l 

Jx rltt(/£) iiuf/c) 
i=t=r 

+ [ n Vr(.i)*( l ~ e -it d £ 

J x r In (k) 2 b 

= P^ + P^x) (say). ^ 

We now note that the second term in (15) (P,(®)) represents the chance for the maximum 
to exceed x and yet to correspond to a period j in the set (/,) ofm-lzem intensities 
We now consider the studentized ’ form of the test: 

I*t 0,(5) denote, as before, the distribution of a sample standard deviation based on p 
degrees of freedom; then 

l- P{ f F *)=jj i , {s ) (J ~$(s*F*Z))<h. (]6) 

Substituting the expression (15) for 1 ~ f>, we have for the power 

1 - P(fF*) = 0„(s) P 1 {2s 2 F*) ds + | ^ <f> v (s) P^F*) ds 

~PzUF*) + P t QF*) (say). (I7) 

The second term in (17), P,(fF*), represents the frequency of reaching a misleading con- 

o usion by the test. In order to obtain an upper bound for P 4 , We use (igi anc j / jr , an[I 
remember that the y 3 integral is a monotonically increasing function of x and hence a 
ecreasmgfmiction of p and a monotonically decreasing function of its degrees of freedom, 
i thei, since from (12) it is obvious that v k ^ 2 and p k >\, it follows that in (11) 

and therefore in (17) < ^ 

W*)<J u 0„(.s) ” (1 -e^)"" 1 e-Hd£di 


m-l 
< - 

m 


~ j” 4>v( s ) {1 - ( 1 - e~* 8jl ’*)"'} ds = m f n l a , 


( 18 ) 

^2Zfp 40 ft t th r° r ° f tb ! ** ie ' the Signiflc ^oe level chosen for the 
sities are used inlhcF t Tir" theref ° re ’ that if fche inde P*ndent harmonic inten- 

is negligible KendtlhT t ° ^ ‘ mM ’ by ^ test in the sense defined 

depended intense “ 7 ™’ ° f °° UrSe ’ maMy dil ' 6Ctecl ^nst tests involving 

• We sha11 discllss this ftspect in 

5. Tun distribution of the maximum F ratio when the observed series has 
a general systematic component 
We now proceed to examine the F max test under the general hypothesis 

. . . B r‘- y t ~Y, + e t (< = 0,1 . n-l), 

w lere the e, are independent random normal variates. 

- ssuming for convenience that n be odd, and representing the Y t 
ourier expansion, we note that the above assumption is equivalent to 

rr. . d»-l) 

Hn-i) 1 *!■ y t ~ A a + 2 (A { cosity + B { sin ity) -p ^ 



H. 0. Hartley 


199 


This hypothesis, therefore, differs from m H l in that (n — 1)—m further real Fourier terms may 
occur in the representation of the Y t . It is now convenient to express the magnitude of these 
additional terms as a percentage of the variance of the e, by introducing the non-centrality 
ratio K»-D 

A = s {A\ + B\)l<rl (19) 

i—m+1 

A will in general depend on the choice of m, the maximum period for which a real amplitude 
is suspected and n , the number of observations made. If we assume that the systematic 
series Y t are the ordinates of a smooth function Y(x), then, from standard Fourier theory, it 
is easy to prove that, with m chosen sufficiently large, A is as small as may be required for 
any n > 2m. In practice, therefore, if we suspect Y t to involve terms of any frequency up to 
order in, the F maXi test can only be expected to detect all these, if the number of observations 
n > 2m, i.e. if the interval at which observations are made is well below the smallest suspected 
half wave-length. If we have little information on the smallest half wave-length to be ex¬ 
pected and decide on too small values of m and n, the i 7 max- test, which is based on the 
assumption A = 0, will be biased by an amount depending on the value of A. The effect of 
a positive A on the F maXi probability integral is given by replacing F by (1 — Ajv) F and v 
by v’ in (7) and (14), where 

v' = {v + A)*l(v + 2A), p = (v + 2A)j[v+A). (20) 

The above modification gives the approximate jP maXi distribution under the most general 
assumption of a systematic component in the series. We would, however, stress again that 
the test is inappropriate unless we know that the Y t can be represented as an aggregate of 
a moderate number of Fourier terms so that m can be chosen with a moderate value and yet 
A will be expected to be 0 or small. 

6. Harmonic analysis as periobogram analysis 

In certain problems in which we search for the unknown periods of an aggregate of Fourier 
terms, such a search can he made by a harmonic analysis choosing the total x range over 
which observations were made as the fundamental wave-length and evaluating intensities 
over a doubly limited range m! < i ^ m. Often, however, such a wide interval periodogram 
will miss certain periods and intensities must be computed at a finer interval. In this case 
they will no longer be independent, but as their correlations will in general be small the 
distribution of the F max test is still tractable. We hope to show this in a subsequent paper. 

7. Illustrative examples for the test procedure 

The examples given below consist in short series and are given as illustrations only. 

(a) The monthly mean temperatures, in °F. for Greenwich during 1939-40, were taken 
from the Smithsonian World Weather Records (Clayton, H. H. & F. L., 1947), and are given 
below to the nearest degree and with 30° F. as origin §: 



Jan, 

Feb. 

Mar, 

Apr. 

May 

June 

July 

Aug. 

Sept, 

Oct. 

Nov. 

Dec, 

1939 

12 

13 

13 

18 

23 

20 

31 

33 

29 

18 

19 

8 

1940 

1 

8 

14 

18 

26 

33 

31 

32 

27 

20 

15 

9 


§ Brunt (1917, p. 210) describes a large-scale harmonic analysis of the Greenwich temperature 
records of earlier years hut on different lines. 



200 Tests of significance in harmonic analysis 

We may attempt a representation of temperature as a harmonic series with, a 2-year funda¬ 
mental period. The annual effeot will then occur for period i = 2, and we allow for further 
superimposed Fourier terms for periods i — 1,3,4,5 and 6. Below is given the analysis of 
variance table for such an analysis: 



Sum of squares 

D.F. 

Mean square 

F ratio 

Total 

1986-0 

23 

__ 


Period i = 1 

1-4 

2 

0-7 

0-1 

= 2 

1830-2 

2 

915-1 

137-0 

= 3 

63-0 

2 

26-6 

4-0 

-4 

7-6 

2 

3-8 

0-6 

= 6 

18-3 

2 

9-2 

1-4 

= 6 

2-2 

2 

1-1 

0-2 

Residual 

73-3 

11 

6-7 



The ‘Residual’ component has been computed from (3) with Fourier coefficients a i and b { 
given by (2), whilst the ‘ Period ’.components are given by 12 {a\ + b\). The maximum F ratio, 
137, is that for the annual period (i = 2). If we wish to test significance at the 5 % level and 
use the approximation referred to at the end of § 2, we should compare this F ratio with the 
5/m = f % point of the ordinary F distribution for 2,11 degrees of freedom. The ratio is, 
of course, highly significant. 

We now repeat the test for the residuals of the original series from the annual wave. 
These need not be computed, as their harmonic analysis would automatically reproduce the 
intensities of the original series shown for i = 1,3,..., 6 in the above table. The maximum of 
these five F ratios, 4-0, is that for the 8-months period (i = 3), and this should be compared 
with the f % point of F. This is 7-2, and the 8-months period is therefore not significant. 
This completes the test, as the search for significant periods must stop as soon as an insigni¬ 
ficant result is reached. 

We may ask the question, how large a real amplitude of the 8-months period should have 
been for it to have a reasonable chance to be detected by the second application of the F mw< 
test used above. The power (14) of the test depends on the values of all the five amplitudes 
A 2 + B^ (i = 1,3, ...,6). We evaluate it for the situation when only the 8-months period 
[i = 3) has a real amplitude, measured by the non-centrality ratio 

I A = 12(4+ BDK 

whilst the amplitudes for periods i = 1,4,6 and 6 are zero. These values are tabulated below 
for a few selected values of A. Below these we show the corresponding power when v = co, 
i.e. when the test is based on a very long series. It will be seen that A must be of the order 20 
(i.e. the ratio ^{A 2 + B 2 )j a 2 should be of the order f) to be detectable with a reasonable chance: 

Power of Fmux. test, m = 5, only one amplitude real 


A 



0 

5 

10 

15 

20 

v~ 11 

0-05 

0-15 

0-44 

0-59 

0-78 

V = 00 

0-05 

0-29 

0-62 

0-85 

0-95 


The power is, of course, considerably larger if some of the other four amplitudes are real, 
as it will then contain the chance of detecting these other amplitudes instead of the one 




H. 0. Hartley 201 

corresponding to i = 3. The power when all five non-centrality ratios are equal to A is shown 
below for v = oo. 

Power of Jmax, t es L> wi = 5, all five amplitudes equal and greater than zero 

A 

0 5 W 

r = oo 0-05 0-78 0-99 

(6) A similar analysis was carried out for the records of total precipitation at Greenwich 
during the same 24-months period. None of the periods was found to be significant. 

REFERENCES 

Brunt, D. (1917). The Combination of Observations. Cambridge University Press. 

Clayton, H. H. & F. L. (1947). World Weather Records. The Smithonian Institution, Washington. 
Finney, D. J. (1941). Ann. Eugen., Lond., 11, 136. 

Fisher, R- A, (1929). Proc. Roy. Soo. A, 125, 64. 

Hartley, H. O. (1938). J. R. Statist. Soc. Suppl. 5, 80. 

Hartley, H. 0. (1944). Biometrika, 33, 173. 

Kendall, M. C. (1946a). The Advanced Theory of Statistics. London: Charles Griffin and Co. 
Kendall, M. G. (19466). Contributions to the Study of Oscillatory Time Series. Cambridge University 
Press. 

Patnaik, P. B. (1949). Biometrika , 36, 202. 

Schuster, Sir Arthor (1898). Terr. Magn. Atmos. Elect. 3, 13. 

Walker, Sir Gilbert (1914). Mem. Indian Met. Dep. 21, part 9. 



[ 202 ] 


THE NON-CENTRAL f- AND E-DISTRIBUTIONS AND 
THEIR APPLICATIONS! 

By P. B. PATNAIK, University College, London 
1. Introductory 

In the Neyman-Pearson theory of testing statistical hypotheses, the efficiency of a statistical 
test is to be judged by its power of detecting departures from the null hypothesis. Thus 
besides knowing the random sampling distribution of a given statistic T under this hypo¬ 
thesis, say B a , it is also necessary to know the distribution of T under admissible hypotheses 
alternative to H 0 . Hence the power function of the test is obtained. In the case of the well- 
known tests using t and F, the evaluation of their power functions involves the use of what 
have been called non-central distributions. For example, if we are applying the f-test to 
examine if a sample has come from a normal population with mean ja ~ 0 (H 0 ), we know that 
under H 0 , t has a 5 % chance of exceeding the 5 % point of its distribution. But in order to 
compute the power of the test we wish to know the chance that t exceeds this point when y has 
alternative values, not equal to zero. This chance is given by the non-central ^-integral. This 
distribution has been studied by Fisher (1931), Neyman (1935), Neyman & Tokarska (1936) 
and Johnson & Welch (1939). In a similar way, the non-central y 2 - and F -distributions arise 
in consideration of the power functions of the y 2 - and variance-ratio tests. 

The power function may be used either to determine the extent of the departures from H 0 
in a given direction, which will be detected as significant (at a prescribed level) with a given 
probability, or it may be used to determine in advance the size of experiment necessary to 
ensure that a worth-while difference will be established as significant, if it exists. But apart 
from its value in this connexion, the study of non-central distributions is of considerable 
interest. The mathematical forms of these distributions of t, y 2 and F have been long known, 
but their use without extensive tabling has not been easy. The present paper is therefore 
concerned with two lines of investigation: 

(а) The derivation of certain approximations to the probability integrals of (i) non-central 
X 2 , and (ii) the ratio of non-central y a to an independent central y 2 , which we have termed non¬ 
central F, These approximations, depending on tabled functions, permit easy calculation. 

(б) Discussion of the ways in which these distributions may be used in connexion with the 
power functions of statistical tests, 

2. The non-central x 2 -distribution 
2-1. Geometrical derivation 

As is well known, the statistic y 2 is defined as the sum of squares of (say) n independent 
random deviates, §j, all drawn from a normal population with mean, 0, and standard 
deviation, cr, viz. 

t - 2 

1 

f Part of a thesis approved for the degree of Ph.D, of the University of London. 



P, B. Pathaik 


203 


If, however, the mean ^ is a { and we write 

*< = 

then we have the non-central y 2 defined by 

X' 2 = 2; (* i +a i )»/<r*- 

1 = 1 

The probability distribution of y' 2 has been obtained by Fisher (1928) as a particular case 
of the distribution of the multiple correlation coefficient. A purely analytical proof was given 
by Tang (1938). As y' 2 is a generalized form of y 2 it may be of interest to compare its 
geometrical representation with the familiar geometry of y 2 . We therefore give a direct 
geometrical derivation of the y' 2 -distribution. 

Without loss of generality we shall assume in what follows that a = 1, so that the proba¬ 
bility law of x is given by 

Then X' 2 = S Zl 

i=l 

In the n-dimensional space of the £’s, suppose 0 is the origin, P the point (£ a , ...,£„), 
A the point (a v ...,«„), LPOA = 0 and M the foot of the perpendicular from P on OA as 
shown in Fig. 1. Then 

OP 2 = y' 2 , OA 2 = S af = A, say. 

i—1 



Fig. 1 

From (1), the probability density at P is proportional to 

exp |~- ~ _£ {Zi - «i) 2 ] » exp [ - \PA 2 ] = exp[ - 4(y' 2 + A - 2y' cos (9)]. (2) 

If we keep OP and 6 fixed, P describes an (n — 1) -dimensional sphere of radius PM = y' sin 8 
with its surface area proportional to (y' sin 8) n ~ 2 . If y' is increased to y' + dy' and 8 to d + dd, 
then a disk of area y'dy 'd8 moves round this surface and hence covers a volume proportional 
to 

(y'sin B^x'dx'dd. 

To obtain the distribution of y' alone, we integrate out 6. Thus 

p(y')dy' = G f e _l( x'* +A_2 x' Aoos£ ') (y' sin 6) n ~ 2 y' dddx’, 


204 The non-central x 2 - and F-distributions and their applications 
which is equivalent to 

P(X' % )dx' 2 = — e-Wz' 2 + A > (x' 2 )*" -1 dx' z x + e v ^' COB0 )sin n - z 9 dd. 

2 Jo 

Expanding the integrand and integrating term by term, we find 
**■> ■ 

If zero is substituted for A, this reduces to the ordinary ^-distribution which therefore gives 


us the value of G. 
We then have 


I(X 1 ,to r(i»+i)2«.ji* 


(*) 


2-2. Derivation through a transformation of variates 
Next we will show that it is possible to effect a variate transformation so as to transform %' 2 
into a sum of (n— 1) central squares and a single non-central square and then derive its 
distribution. Make the following orthogonal transformation: 

2/l = c ll^l + c 12^2+ ••• + c l»£n> ) 


(5) 


Then 

Generally, if 
we have 


Vn ^nl £l &n2 £2 ^nn £n * J 

n n 

s a = s 

i i 

<%j) = c n a x + c^a 2 + ... + c jn a n = 6, (j = l to n), 

n n 

s = s 6?, 


and 

Now we can make 


1 

,71 


S ®i£i = S 

i i 


( 6 ) 


b i = = • • • = K-i = 0 and K - V( Sa ?) = V A - 

71 ^ ^ 71—1 

Thus x' 2 = S If is distributed as 2 2 /f+y|, the sum of the squares of (n— 1) normal variates 
i i 

with mean zero and the square of a single normal variate with mean fX, the s.d .’s being unity. 

71-1 

Writing 

we see that u has a ^-distribution with (n — 1) degrees of freedom, that is, 

e -lu u l(n-3) 


X 2 = w, = » and y\ = v, 


p{u) = 


and that v follows the law 




p{v) = 


V~i 

2V(2n) 

1 


A)» + g-it-V d-VA^J. 


g — VV+A) ■ 


,/ vA (nA) 2 \ 

, ’T + IT + 4T+"-)' 


2‘rii) 

Hence, replacing v by (w — u) in the joint probability law p(u, v), we have 
e -iv> e -K w l(n-i) U\~i U)XL UM 

p( ,w) 2»*r(i)r[^«-i)]U) + 2t( 1_ ^ 


+ ... . 




P. B. Patnaik 


205 


Whence integrating with respect to u from 0 to to, we obtain 

g-iiu e ~l\ w Hn~2) 


that is, p(X 


e -iw e ~l\ w Hn- 2 ) | Jn-l 1\ wX „ In -1 3\ ] 

^ (W) “ 2*T(i)r[4(»-l)]r( 2 ’2) + 2! jB \ 2 1 2j + ‘‘) 5 

v ,« = 1 /xOA\ _1_/x^A\ 2 ) 

,V ' 2»«T(i») 1 »\ 2 } + n(n + 2).2!\ 2 j + "T 


P) 


which is seen to be the same as (4). 

In this distribution of y ' 2 , n may be called the number of degrees of freedom and A, which is 

n 

equal to the sum of the squares £ aj, the non-eentral parameter. 


2-3. Conditional distribution of y" 2 under linear constraints 
Suppose the g’s are subject to h (< n) linear constraints. These can be transformed into an 
orthogonal set represented, say, by the equations 

n 

S Cjdi = Pi (j=l,...,k), (8) 

i = l 

n n 

where E = 1, 2 c ji c u = ® (i=Mb 

i=l ( = 1 

We make an orthogonal transformation of variates defined by the equations (5), so that 

transforms to i/y 2 and the /r constraints of (8) become simply y x = p x . y k = p k . To find 

the distribution of subject to these conditions, we first see that, in virtue of the relations 
in (6), the joint probability law of the £’s 

P{Zi> • • •. D = C exp | - i 2 Hi ~ a £ ) 2 j 
transforms into p(y lt ..., y n ) = C exp | - i 2 (?/,• - 6 ; -) 2 |. 

When y v ...,y k take respectively the constant values p 1 . p k , we have the conditional 

probability law 

PiVk+V • • ■, Vn I ft. • ■■■. Pk) = °1 ex P (- \ (Vj - fy) s ) • (9) 

It can be shown from (9), as in § 2-1, that the sum of the non-central squares (y| +1 +... + y\) 
is distributed as a y' 2 with in - k) degrees of freedom and parameter 

^ = bl +1 + ...+bl. 

From (6) we see that y\ +x + ... + y\ = - (pi +... + p\) 

and 6| +1 +...-|-6 2 = (6 2 +...+6|) 

n kin \2 

= 2 «$- S 2 OtcJ . (10) 

i=i j=i \i=i / 

( U fc \ 

2 — 2 p 2 ) is distributed as a y' 2 with (n - fc) degrees of freedom and parameter A, 

given by the expression in (10). 



206 The non-central y 2 - and F-distributions and their applications 

In particular, if there is only a single constraint on the £’s, given by 

n n 



Sc <& = />, Sc?=l, 

1 1 

follows a ^^-distribution with (n— 1) degrees of freedom and 
n /n \2 

A= • 


(11) 


( 12 ) 


3. Approximations to the ^^-distribution 


31. The y 2 -approximation 

Fisher (1928) has shown that the distribution function of y' 2 given by (4) can be expressed in 
terms of a Bessel function with imaginary argument. When n, the number of degrees of 
freedom, is odd, this can be reduced to elementary functions. When n is even, we see that the 
probability integral 

f P(x' 2 )dx ,!s 

J x" 

can be expressed as a double Poisson sum. However, in both cases, the labour of calculating 
the probability integral is considerable. 

In his paper, Fisher has given a table of the upper 5 % significance points of the y' 2 - 
distribution for n = 1 to 7 and ,/A = 0 (0-2) 5-0. Garwoodf has an unpublished table of the 
lower 5 % points for the same range of values of n and A. No tables of the probability integral 
are available. It may therefore be useful to have an easy method of determining the prob¬ 
ability integral and percentage points sufficiently accurately for any given values. For this 
purpose we shall consider several approximations to the distribution of y' 2 . 

The characteristic function of this distribution is easily seen to be 

w = ex p{iry/ (1 - 2i *> 4n - 

Hence we have the following cumulants: 


Xj —n + A, x z —2(n + 2A), 1 . . 

k 3 = 8(n-i-3A), k 4 = 48(« + 4A),j 

the general rth cumulant being 

K r = 2 r ~ 1 (r - 1)! (ra + rA). 

In the /? L , /? 2 diagram, it was found that the point computed from the above k’s moved 
close to and above the Type III line, and this suggested that we might fit a Type III distribu¬ 
tion from the first two moments. This is given by 


,. e~ iv y* v ~ l 
2»T{\v)’ 

where y = y>, 

_ 2A. _ A _ (n + A) 2 A 2 

n + A. n + A’ V (n + 2A) n + n + 2A' 


(14) 


(15) 


This means that we are representing the distribution of (y' 2 /p) by that of y 2 with v degrees of 
freedom, v being in general a fraction. 


f I am grateful to Dr F. Garwood for kindly making his table available to me for reference. 



P. B. Patnaik 


207 


In what follows we shall write x for ^' 2 , p(x) for the true distribution of x' z with n degrees of 
freedom and parameter A and f(x) for the approximation to p(x) obtained by assuming that 
xjp = y is distributed as x 2 with v degrees of freedom. 

Then the probability integral rx rv 

P{%) dx = p(y)dy 
Jo Jo 

rv 

is approximately given by I ^ f(y) dy. 


This integral can be expressed in the notation of the tables of the Incomplete T-function 
(K. Pearson, 1922) as I(u,p), where 


y 

= f[2(n + 2\)}' 



(n + A) a 
2(w + 2A) ’ 


(16) 


and could be evaluated by interpolation in these tables. For interpolation w-wise the second 
differences with Everett interpolation coefficients may be used, while linear interpolation 
p-wise seems adequate. 

The approximations to the probability integral so obtained for certain values of n, A and x 
are shown in Table 1 for comparison with the exact values. In some of these cases x is the 
upper 5 % point (Fisher) or the lower 5 % point (Garwood), so that the exact values are 
O'95 or 0-05. The others are directly computed. For many purposes, especially in connexion 
with power functions, the degree of accuracy given by this method may be considered quite 
adequate. 

rx 

Table 1. Shoidng exact and approximate values of the y' 2 - integrals , J p(x) dx 


n 

A 

X 

Approx. 

Exact 

4 

4 

1-765 

0-0399 

0-0500 


4 

10-000 

0-7191 

0-7118 


4 

17-309 

0-9492 

0-9500 


4 

24-000 

0-9913 

0-9925 


10 

10-000 

0-3178 

0-3148 

7 

1 

4-000 

0-1621 

0-1628 


1 

16-004 

0-9499 

0-9500 


16 

10-257 

0-0430 

0-0500 


16 

24-000 

0-6947 

0-5898 


16 

38-970 

0-9482 

0-9500 

12 

6 

24-000 

0-8187 

0-8174 


18 

24-000 

0-2936 

0-2901 

16 

8 

30-000 

0-7895 

0-7880 


8 

40-000 

0-9626 

0-9632 


32 

30-000 

0-0590 

0-0609 


32 

60-000 

0-8329 

0-8316 

24 

24 

36-000 

0-1556 

0-1567 


24 

48-000 

0-5333 

0-5296 


24 

72-000 

0-9656 

0-9667 




208 The non-central y 2 - an d F-distributions and their applications 

To find the percentage points of the x' 2 distribution, we first interpolate in the appropriate 
percentage point tables of the x 2 (e.g. Thompson, 1941) for v degrees of freedom and then 
multiply the interpolate by p. Four-point Lagrangian interpolation formulae may be used. 
The approximate upper and lower 5 % points obtained by this method for certain values of 
n and A are given in Table 2, along with the exact values. Clearly the accuracy is not as good 
for the lower points as for the upper ones. Although the comparisons have had to be confined 
only to small values of n, since Fisher and Garwood have only given exact percentage points 
up to n = 7, from the closeness of the probability integral approximation (Table 1) we could 
still expect that the approximation to the percentage points would be fairly close for higher n. 

These approximations based on the y 2 fit will be referred in subsequent sections as the 
first approximation. 


Table 2. Showing exact and approximate values of the 
percentage points of the y' 2 - distribution 


n 

A 

Upper 5 % point 

Lower 5 % point 

Approx. 

Exact 

Approx. 

Exact 

2 

1 

8-63 

8-64 

0-20 

0-17 


4 

14-72 

14-64 

0-94 

0-65 


16 

33-35 

33-06 

6-89 

6-32 


23 

45-66 

46-31 

12-68 

12-08 

4 

1 

11-72 

11-71 

0-93 

0-91 


4 

17-38 

17-31 

1-95 

1-77 


16 

35-69 

35-43 

8-36 

7-88 


25 

47-94 

47-61 

14-26 

13-73 

7 

1 

16-01 

16-00 

2-51 

2-49 


4 

21-28 

21-23 

3-78 

3-66 


16 

39-16 

38-97 

10-64 

10-26 

1 

25 

51-34 

51-06 

16-68 

16-23 


3-2. The normal approximation 

It is known that, for n > 30, Fisher’s approximation, that 2y 2 ) is distributed as a normal 
variate N(f(2n~l), 1 ),f will give fairly close values to the probability integral and per¬ 
centage points of the y 2 -distribution. It can be shown that a similar normal approximation is 
available for the y' 2 -distribution for large values of n or A. 

First we shall show that y' approaches normality with greater rapidity than y' 2 . 

If x is written for y' 2 , and x 0 is mean x, we have by Taylor’s theorem 

** = 4 + i(* “ *o) ~ - z 0 ) 2 ®o _{ + Te( x ~ *o) 8 ^ + • • •, 

®* = 4 + K* - *o) 4 +1(* - *„) 2 *T* - - x a f x^ + .... 

t Here and below the notation N(a, b) is used to indicate that a variable is normally distributed with 
mean a and standard deviation b. 




P. B. Patnaik. 


209 


By taking expectations on both sides and substituting from (13) the moments of a = y' 2 , we 
get and /4 of y'. Also 

8 /4(x') = /4(a /2 )> /4(a') = /4(a' 2 )- 

Hence we derive the following moments: 

, , iu 7i + 2A i 1 n+3A 15(?i+2A)“ i 

/h -(n+ ) _ 4( w + A)1 + 2(to + A)» _ 32 (» + A)* + "” 

/ /y 2 “ ~\~ A), 

, / 3 7H-2A 1 ?i+3A 9 (n + 2A) 2 

2(n + 2A) + (»+A)», 

from which we obtain 


ft[ = («' + A) 1 - 


ti + 2A 


/‘a — 


4(w + A) } 
w+3A 3(w + 2A) a 


w + 2A 

ft 2 ni 7 T7 "T • • -j 


(w + A) } 4 (w + A) j 


2(w + A) 
t-., fti = ! + 0[(w + A)- 2 ]. 


Hence 


/t 3 _ n 2 + 4nA 
7l = T\~ V2(^+A) (ti+2A)V + --’ 


72 — — 3 = 0[(n + A)~“]. 

/“2 


Comparing these with the corresponding coefficients of the ^'“-distribution, viz. 

V8(n + 3A) 12(re + 4A) 

Tl “ >~+2A)» + '"’ r2 “ (n + 2A) 2+ '“’ 


we see that y' approaches normality faster than y'“. 

From the above it follows that ^/(2y' a ) has mean ^/{2(w + A) — (n + 2A)/(m + A)} to order 
(n + A)~ } and variance (n + 2A)/(n + A) to order (n + A) -1 . We can therefore regard 


as distributed normally with mean 




2 ^±A)1 

n+ 2A j 


2(rc +A) 2 
?i + 2A 


and variance unity. 

This result may also be derived by taking the ^“-approximation to the ^“-distribution and 
then using the known result that for large v, •,]( 2y 2 ) is distributed as N[J( 2v— 1), 1]. For, 
substituting y' 2 /p for y“ and the expressions in (15) for p and v, we reach the same normal 
approximation. 

Since v>n from (15), it can be seen that the normal approximation to y' with n degrees of 
freedom will be better than the normal approximation to y with the same degrees of freedom. 
Thus, for example, if n = 25, we have 


A = 0 10 20 30 40 

v = 25 27-22 31-15 35-59 40-24. 


Hence for sufficiently large values of 7i and A, the probability integral and percentage points 
may be obtained from the normal tables. Table 3 gives a comparison of some values of the 
probability integral, thus calculated, with the exact values. 

Biometrika 36 .14 



210 The non-central f 1 - and F-distributions and their applications 
Table 3. Values of the y ' 3 -integral on the normal approximation 


n 

A 

V 

X 

From 

X s 

From 

normal 

Exact 

10 

32 

28-8 

30 

0-0690 

0-0638 

0-0609 

10 

32 

28-8 

60 

0-8326 

0-8320 

0-8316 

24 

24 

32-0 

36 

0-1556 

0-1615 

0-1567 

24 

24 

324) 

72 

0-9656 

0*9086 

0-9667 


3-3. Closer approximations to the y ' 3 -distribution 

The probability function of y' 2 can be represented in the form of a series with the fitted 
probability function of (py 2 ) as the leading term and, from these mathematical expansions, 
closer approximations to the probability integral and percentage points may be obtained. 
Two methods will be briefly considered. 


First method 

The cumulants of the distribution/^), as defined on p. 207 above, are seen to be 
x* = n +A, 4 = 2(w + 2A), 


8(to + 2A) 2 48(ra + 2A) 3 

K3 “ n + X ’ Ki ~ (m +A) 2 ’ 


(17) 


the rth cumulant being 


k* = 2 r ~ 1 (r-l}\ 


(n + 2A) ; 


r—1 


(n + A) r_2 ‘ 

Comparing these with the corresponding cumulants of p(x) in (13), we find k* > k t for r > 2. 
Let us write 

K 3~~ *3 = c 3> ^4 — K t — c 4> •••• (18) 

Then the corresponding differences of cumulants of p{y) and f{y) as defined on p. 207, will be 

C 3 /P 3 . C 4// 4 . 

By the application of the Edgeworth operator to f(y) we have 
c, d 3 c , 


p{y) = exp - 


; + 


-[ 


6p 3 <% 3 24 p i dy i 


* d '+...\m 




Hence the probability integral J p(y) dy is given by 


f(y)dy+ |- 




+ 


^{(^)V»M + (^) , /<’>(!/) + -} + ...]. (19) 


Since the^ higher derivatives of f(y) become smaller in value for a given y, we retain only the 
first term i.n the square brackets of (19) and get a second approximation to the probability 
integral in the form 

jcg_ f v 

6 p 3 dy 3 


f{y)dy-{ 3 - 


ijjiy)dy, 




P. B. Patnaik 


211 


which can bn written as 


I(u,p) 


c 3 d z I 


( 20 ) 


When using the expression (20) for the evaluation of the integral, the computation of the 
first term 7(w,p) will, in general, require interpolation in the tables of the Incomplete P- 
function. We shall now show that by a suitable modification of the Everett interpolation 
formula, the second term in (20) can be accounted for and the whole expression computed in 
one calculation. 

If Mj, u 2 are the tabulated values between which u lies and A'{, Aj the tabulated second 
differences, we have as an approximation 


d?]_ 

du z 


.(a;-a;)io», 


the interval for u being 0-1 in the tables. Suppose q is the fraction (u~-u l )l(u 2 — u 1 ), E{, E'% 
the second-order E\ 

Then (20) becomes 


c 10 3 

the second-order Everett interpolation coefficients corresponding to q and k = 3 


6p 3 [V(2.)] 3 ' 


/(Mi, P) ( 1 - q) + I(u 2 ,p) q + A l(E[ + k) + A l(El -1c). (21) 

If p is not a tabled value but lies between p 1 and p 2 , then we evaluate the above expression 
for and for and then interpolate linearly for p. 


Second method 

It is well known that by using the Edgeworth form of the Gram-Charlier Type A series, 
a frequency function can be normalized if it approaches normality asymptotically and if its 
cumulants are in increasing order of some quantity, n~ l . 

Goldberg & Levine (1946) have shown that by the method of normalization the percentage 
points of the ^“-distribution could be obtained to a fairly good degree of accuracy. A similar 
method might be applied usefully to the ^'“-distribution. However, a modified form of 
expansion with the fitted ^“-function as the first term will be found more suitable. 

Let us standardize the variate x (written for y' 2 ) by introducing 

_ x - {n + A) 

“ V(2a+4A)* 


Then, using the same notation as before, the cumulants of the distribution p(£) are 

0, 1, x 3 /xl, kJkI . 


Sinee/(a;) has the same mean and standard deviation as p(x ), we get for the cumulants of /(£) 


0 , 1 , k*I4, k\!k\, .... 


These cumulants, from the third onwards, are of orders — — 1, 

let 


a = a(£) = e-«7V( 27r ), 


— |,... in both n and A. Now 


and let £ 3 , g 4 , ... be the Hermite polynomials of orders 3,4, .... Then we have, arranging the 
terms in order of magnitude of n (Kendall, I, 1945, § 6-32), 

?>(!)- + m 

There is a similar expansion for/(£) with k* in place of K r (r> 2). 


14-2 



212 The non-central f- and F-distributions and their applications 


Now we subtract formally this second series from the first, term by term, and transfer/(g) 
to the right-hand side. We then obtain 


P(g) =/(£)'+«(£) [5 !&>+ 



(23) 


where c 3 c 4 have the same meanings as in (18) and c rhl is written for — k*k*k*). 

We know that the infinite series in (23) is not uniformly convergent. We can still integrate 
it formally term by term and make use of the first few terms to get a better approximation 
than that given by the integral of/(g) alone. Thus retaining terms up to 0(n~i), we derive an 
approximation to the probability integral 


in the form 


f p(x)dx = J^(£)d£ 

J_^4 r , J_^33 r \ r . j; c 34 

24x| 53 72 /c| 5 7 \120/cp 4 144x| 


£e + 


1 Cqoo y 

1296 4 ^)_T 


(24) 


The first term in (24) is our first approximation of § 3-1 and the rest give a correction to it 
which is seen to result in a considerable improvement (see Table 4). For evaluating this 
expression, the values of the Hermite polynomials may be taken from Jorgensen’s tables 
(1916) if £ is an argument tabled there; otherwise they have to be directly calculated. a(£) 
may be found (without need for interpolation) from Tables of the Probability Functions, 
Vol. 2 (Federal Works Agency, New York, 1942). 

The coefficients in (24) involve only differences of the cumulants and so are smaller than the 
corresponding coefficients in (22). Thus a closer approximation is likely to result from (24) 
than from the same order of terms in (22). 

For the percentage points, we employ the inversion of the Gram-Charlier series obtained 
by Cornish & Fisher (1937). If x, x' and £ are respectively the percentage points of the 
distributions p{x),f(x) and a(£), then for a given probability level, we have 


_ fffy J_ 

/(2w + 4A) ^ aa a S4tn ^ ar expansion with k* in place of K r (r > 2). By differencing as before we 
obtain an expression for x in terms of x' and £. Retaining terms up to 0(«H), we find 


x = !r' + ^/(2w+ 4A) 




1 ) + 




■eg) + 




f (12 ^ 4 - 53 ^+ 17) }]- ( 2B 


5) 


In this, x' is our first approximation, and the correction improves it considerably even at 
the lower end of the distribution. The values of the expressions in £ in (25) are directly avail¬ 
able for several probability levels from the table in Cornish & Fisher’s paper. 

The approximate values of the probability integral of the y' 2 -distribution obtained by 
these methods in a few cases are given in Table 4. Table 5 shows the approximate upper and 
lower 5 % points evaluated by method II. 

Comparing the two methods for the probability integral, the second one, employing 
terms of the Gram-Charlier series up to 0(w -! ), gives greater accuracy and is to be preferred, 



P. B. Patnaik 


213 


although from the point of view of labour and time involved, the first method is simpler and 
easier to apply. With respect to the percentage points, the method using the Cornish-Fisher 
inversion appears to be quite good, particularly at the upper points, but it does involve 
a certain amount of labour. 


Table 4. Closer approximations to the y' 2 - integral 


n 

A 

X 

1st 

approx. ‘ 

2nd approx, method 

Exact 

I 

II 

4 

4 

10-00 

0-7191 

0-7209 

0-7119 

0-7118 

4 

4 

24-00 

0-9913 

0-9917 

0-9913 

0-9925 

7 

16 

24-00 

0-5947 

0-5938 

0-5869 

0-5898 

7 

1G 

38'97 

0-9482 

0-9504 

0-9502 

0-9500 

16 

8 

20-00 

0-3380 

0-3345 

0-3368 

0-3369 

10 

8 

40-00 

0-9626 

0-9632 

0-9631 

0-9632 


Table 5. Closer approximation to the y' 2 - percentage points , using method II 


ft 

A 

Upper 5 % point 

Lower 5 % point 

1st 

approx. 

2nd 

approx. 

Exact 

1st 

approx. 

2nd 

approx. 

Exact 

2 

4 

14-72 

14-67 

14-64 

0-945 

0-574 

0-646 

2 

16 

33-35 

33-06 

33-06 

6-891 

6-526 

6-322 

4 

4 

17-38 

17-33 

17-31 

1-954 

1-731 

1-705 

4 

16 

35-69 

35-42 

35-43 

8-363 

8-017 

7-884 

7 

4 

21-28 

21-27 

21-23 

3-789 

3-750 

3-664 

7 

16 

39-16 

38-97 

38-97 

10-637 

10-267 

10-257 


4. Applications of the ^-distribution 
4-1. The power function of the xf-test 

There are several possible applications of the non-central ^-distribution in statistics. We 
shall consider only a few of them. We will show here how this distribution arises in the Study 
of power functions of the y 2 -tests and how the approximations of § 3 are useful in this con¬ 
nexion. 

Suppose £ 2 , ...,£, t are n independent observations in a sample. If we make the null 
hypothesis H 0 , that the have been drawn from a normal population with mean zero and 
s.d. unity, then if H 0 is true, the statistic y 2 = will exceed y 2 , the a-significance point of 
the ^-distribution, based on n degrees of freedom, in a proportion a of the cases. 

The power of the y 2 -test is given by the probability that E£ 2 exceeds y 2 under some alter¬ 
native hypothesis. If as an alternative to H 0 , we suppose that the ^ have been drawn from 
normal populations having unit s.d. but different means a it then will follow the non- 





214 The non-central f- and F -distributions and their applications 

oentral ^-distribution with n degrees of freedom and parameter A = 2a|. Denoting this by 
p n (X ' 2 1 A), the power function is given by 

'“p»(A' a |A)d^=/?(n,A ja ). (26) 

J Xu 

Thus the power is a function of the single parameter A and we may write the null hypothesis 
as H a (A = 0) and an alternative as Hfi A), where //, is a composite hypothesis including the 
family of alternatives for which S a\ = A. 

It was shown in §3-1 that the ^-distribution is fairly well approximated by a Type III 
distribution fitted from its first two moments. The power function /? could therefore be 
evaluated quickly and fairly accurately by the method of the first approximation, When 
greater accuracy is needed, one of the other methods described in § 3-3 may be used. 

We give here a table (Table 6) of values of the power of the y 2 -test applied at the significance 
level a. = 0-05, obtained by the second method of §3-3. The accuracy of these values in 
different parts of the table can be judged from the closeness between the approximate and 
exact values of the probability integral shown in Tables 1 and 4. In some of the cases tabled 
there, the limit x was chosen near to the 5 % point of the corresponding y 2 , so as to give a 
value of fx 

1- i>n(*|A)i* 

J o 

in the neighbourhood of the power ft. It is believed that, in general, there is three-figure 
accuracy in Table 6. 


Table 6. The power function of the xf'-test using a 5% significance level; 
values of /t(n,h,a), where a = 0-05 



When n or A is so large that v = n + A*/(« + 2A) is over 30, we may use the normal approxi¬ 
mation of §3-2 for obtaining the power function more quickly than by the method of the 
^-approximation. 

The above table can be used in a variety of ways: (a) For given A and n, we may ask what is 
the chance of establishing significance at the 5 % level ? (6) For given n, we may ask how large 
A must be to have, say, a 90 % chance (/? = 0-90) of establishing significance at the 5 % level 






P. B. Patnaik 


215 


when a real difference in the a t exists ? (c) For given A, we may ask how many observations are 
necessary to have a chance ft of establishing significance? 

An alternative graphical approach to the inverse problems ( b) and (c) is indicated in § 7-3, 
p. 228 below. 

4-2. Application to the y 2 -test for the goodness of fit 

The y 2 -test for goodness of fit is concerned with the comparison of observed frequencies 
with those expected under a given hypothesis. The latter may be the theoretical frequencies 
of a continuous distribution or may be obtained by taking integrals of a continuous frequency 
distribution over a set of class intervals. Denote the observed frequencies by n t and the 
expected frequencies by Nir i (i = 1, 2,..., k), where k is the number of groups and N the total 
number of observations in the sample. Then 

k k 

S = S Nn t = N. (27) 

1=1 i —1 


As is well known, the distribution of 




V ( n i~ N 7 T i ) 2 

ih Nn t 


(28) 


when the Nn { are the true population expectations, may be related as an approximation to 
that of the sum of squares of normal variables. To link up also with the non-central theory 
discussed in §§ 2-1-2-3, the following approach may be adopted, although it must be realized 
that the conclusions reached are not exact. As in all problems concerning (j> % , it is generally 
only possible to assess the degree of error involved, in samples of finite size, by specific 
numerical comparisons. 

As shown originally by K. Pearson (1900, 1916), the variances and co-variances of the k 
frequencies n it restricted by the condition (27), are precisely those holding in the section 

X 1 + X 2 +...+X fc = 0 (29) 

of the b-dimensioned normal probability distribution whose probability density at 


is 


p(X u X 2 ,X n ) = constant x exp 



(30) 


Thus, provided that the expectations Xn i are large enough to prevent serious inaccuracy 
from discontinuity effects or boundary limitations, relationships between the % may be 
treated as relationships, within the prime (29), between normal variables X i which in the 
^-dimensioned space are distributed independently with zero means and variances Nn i . 
With these limitations, we may write 


*< 


X i _ n i —NTT i 


(i — 1,..., k). 


(31) 


The distribution of the <p 2 defined in (28) can then be derived from the results given in § 2-3. 
The condition = N may be written 


S V 77 ! 


n i — Nn i 
V(-Mvi) 


= 0 


(32) 


corresponding to Y i c i x i = p - 0, where 2^1 = l. Hence <p 2 will be approximately distributed 

i i 

as x 2 with k—l degrees of freedom. 



216 The non-central f- and F-distributions and their applications 


Having in mind, the question of the power of the test, we may next ask what will be the 
distribution of if the frequencies Ntt^ inserted into the expression (28) are not the true 
expectations? Suppose that Np { are the true expectations; both Elh and will be unity. 

In the notation of § 2 we now have 


r w f - Nn { _n { -Np { n _ N(p i -ir i ) 

k ~ J(Np<) ’ Xi ~ J(N Pi ) ’ { j{N Pi ) ’ 


(33) 


while s Vft = °- ( 34 ) 

It follows that approximately 

Itt (3b) 

will be distributed as a non-central y 2 with k—l degrees of freedom and 


i i Vi 


(36) 


The sum of squares we need is the <f>' 1 of (28), not the <j)' 2 of (35). By introducing a further 
approximation we may, however, conclude that <jP = Y 1 (7i i — N7T i ) i INn i is distributed as 

i 

non-central y 2 with k -1 degrees of freedom, and 


A=iV£ 

i 


(Pi~”iY 


(37)| 


The approximation involved should not be serious if the differences S { = Nn i - Np i are 
small compared to N-np, for 


f 2 = 


fa-Nnjf fa-N^) 2 ! _Vp 1 

i Np t ~r Nn t 1 X FttJ 


i 


fa-N^) 2 

(NTTi ) 2 


+ 1.81 


fa-N^)* 

(Nni) 3 


+ .... 


Since the multipliers 8 { in the second term may be positive or negative and = 0, this term 
will generally be small; the further terms, containing successive powers of (^/(iWq), will also 
be of diminishing importance. 

This result makes it possible to determine the power of the goodness of fit test of any 
simple (completely specified) hypothesis H 0 (specifying probabilities 7r { ) with respect to a 
simple alternative hypothesis H x (specifying probabilities Pi ). Hence, for any given class of 
alternatives H, we can determine the power function. In so far as the 5 % significance level 
is used, the power may be determined from Table 6, p. 214, using the A of equation (37) and 
degrees of freedom k — 1. Otherwise, we can use the ^-approximation to the y^-distribution 
developed in § 31. Thus the power is 


where 


| A)dy' 2 = j^PiAy^dy 2 , 


(> = 


k- 1+2A 
fc-l+A* " 


(b-l + A) 2 
k- 1 + 2A ’ 


A = IV 




(38) 

(39) 


t In- making the approximation, we have associated the A of (37) with the distribution of (fi 1 rather 
than the A' of (3G), but this 9tep perhaps needs fuller justification. 



P. B. Patnaik 


217 


For comparison of this approximate distribution with the exact one, we proceed now to 
find the exact moments of <f 2 . It is known (e.g. Haldane, 1937) that under H 1 the expectations 
of the powers of the observed frequency rq are 
^(rq) = Np it 

£{n\) = N zPi+Npi, 

&{n\) = N i p\+§N i p\ + r lN i p\ + Np i , 

= Ntflp) + N^ViVi + PiPi) + NrfiPj, 
etc., 

where N r = N\fiN-r)\. 


(40) 


Writing (fi 1 in (28) in the form 


we have 


Hence 




■N, 


2 > = w 2 


-N 


7T,* 


N 


Vi n t J 


N. 


(41) 


Again, 


W?] = ^r4 S ^l + i S ^l_2E-? + A 72 l, 

r L N tt\ N 2 +3 jr,7r, ir t J 
from which on substitution and simplication we obtain 

H = ^-H(i7-l)(6-4W)[S(^/ 7 r 1 .)] 2 + 4(W-l)(W-2)E(pf/7rf) 

- 4(W- 1) 2(p|/^) ZiPiK) -|- 6(W- 1) S(p?//r|) 

-P^WP + S^M)}. (42) 

In a similar way the third moment has also been obtained but the expression is so long and 
so difficult to evaluate numerically that it may not be of much value for comparison purposes. 

When p-i = t r i the above expressions reduce to those derived by Haldane (1937) for the 
exact moments of the distribution of f > 2 under the null hypothesis. 

The approximation to the distribution of f 2 obtained, using the simplification of § 3-1, 
will have the following first two moments: 

fi'i = v + A = i-H-A = &-l + iV[E(p?/7r { )-l], | 

l h = 2(v + 2\) = 2(k-l) + 4A = 2(k-\) + <tN[2(rtlrr i )-\]J 
using the expression for A in (37). 

A comparison of these approximate moments with the exact ones, (41) and (42), appears 
to be only possible numerically. Some comparisons have been made, including a check-up 
on the whole distribution by a random sampling experiment. In the cases taken, the 
approximation appeared satisfactory for practical purposes but some farther investigation 
i3 in hand. The results will be published in a subsequent paper. 


4-3. Uses of the power function of the x 2 goodness of fit test 
We have seen in § 4-2 that, to the approximation involved, the power of the y 2 -test for H a 
with regard to an alternative is a function of h — 1, A, a and can be written /?(&-1, A, a), 
where k is the number of groups, a the significance level at which the test is applied and 

A = iv( \=m(H ai H 1 ). 



218 The non-central y 2 - and F -distributions and their applications 

This shows that A is a function of n t and p t , and can he regarded as a measure of ! discrepancy ’ 
between the two distribution functions specified by H 0 and H v 
The power function can be used to answer several questions connected with the test of 
goodness of fit: (a) Tor given sample size N and number of groups fc, we may ask what is the 
chance of establishing the inadequacy of the hypothesis H a , using a given significance level! 
(b) For given k, we may ask how many observations are necessary to give a chance of, say, 
90 % of establishing significance at the 5 % level? (c) Tor given k and N, we may ask how 
large a departure of H x from H 0 (measured by A (H a , Hf)) will be detected with a given chance? 
We shall illustrate these applications by an example from genetics. Consider the intercross 

AB AB 
ab x ab ’ 

where A and B are two independent factors, the recessive genes of which are represented by 
a and b. The offspring are of the four types [AB], [Ab], [aB], [ab] with frequencies in the 
proportions 9, 3, 3,1. We test whether the experiment is to confirm this theory or to reject it 
in favour of a definite alternative giving frequencies proportional to 9, 3, 3 r, r (r being less 
than 1). This happens when the two classes of offspring containing the two recessive genes 
(a, a) are less viable than those containing only one dominant gene, so that only a fraction of 
the offspring survive. 

Here, the expected frequencies are 

7r { : 9/16, 3/16, 3/16, 1/16. 

Pi: 9/4(3 + r), 3/4(3+ r), 3r/4(3 + r), r/4(3 + r). 

Hence (44) 

where N is the number of offspring studied. Then 

A = (45) 

Let us now consider the three situations where the power-function idea could be applied. 

(а) Suppose we have 100 observations. Using the y 2 -test at the 5% level to test the null 
hypothesis (r= 1), the chance of establishing differential viability when r = % is obtained 
by evaluating A from (37) and then entering Table 6 (p. 214) with this A and n = 1c - 1 =3. 
Here A = 300/49 and so the power /? = 0-52. 

(б) Suppose we want a 90 % chance of detecting that r = £, using the 6 % significance 
level. We find from Table 6 that A = 14-1 and hence, putting r = \ in (45), obtain A = 3/49. 
Then from (44) we find that we shall need a sample of N = 230. 

(c) Again, if N = 100, a = 0-05, we may ask how small r must be to give a 50:50 chance for 
establishing significance? We find A as before and solve (44) for r. Thus taking /? = 0-50, then 
A = 5-8 and r = 0-51. 


4-4. A closer approximation to the power function of the y 2 goodness of fit test 
In § 4-2, when deriving the y'^-approximation to the distribution of 


jfi _ y ( n j Nyff 
v Nit, ' 


we made the assumption that andp^, the proportions of the expected frequencies under the 
hypotheses H 0 and H x do not differ very much, so that we could regard (n< - hfp l )j^{Nn i ) as 



P. B. Patnaik 


219 


a normal deviate with zero mean and unit variance. We wili now consider the distribution of 
without making such an assumption and use it for obtaining a better approximation to the 


power 


function. 


We can write cj> 2 in the form 




Npi-NirA* 


(46) 


J(N Pi ) 

the summation being from i = 1 to k. Now, under H x the quantities (n i -iYp i )/^/'(iV''p i ) are 
distributed approximately normally, as N(Q, 1), subject to the constraint = N. Hence 
fjj .2 in (46) can be regarded as the weighted sum of k normal deviates having different expecta¬ 
tions and satisfying the condition Em* = N. 

We have obtained in the Appendix (pp. 231-2 below) the characteristic function of the 
distribution of such a statistic, viz. Hv^Xj + a^ subject to the condition hcfaj+aA = p. 
Making the appropriate substitution in (6) of the Appendix, we have the characteristic 
function of 0 2 : 


V 


1 - iitpjTT 


j 11(1 -Zitplir)-* 


x exp 


N 
' 2 


p-TT 


P 


1 - 2itpjnJ \ 1 - 2f<p/7r 


(47) 


where the subscripts of p { and rr i are dropped. Prom this the expressions for the first three 
moments are derived. Thus 


Z,{pyTr i ) + 'L(p i l7T i )-N, 

As = 24(W- 1)E(p*M)-24 (2N- 1)[2(*«][£»?)] ' ( ’ 

+ S(2N-\mp\jn i )f + mplK). 

It will be seen that the only assumption made here, that (7q-Ap^/Vl-Api) is distributed 
as N( 0,1) under H v is parallel to the assumption on which the y 2 -test of goodness of fit is 
based, namely, that {n i — is distributed as W(0,1) under H 0 , which is justified 

when Nn l are not too small. So, when. Np { are not too small we can expect the moments in 
(48) to agree well with the true moments (the first two of which are given in (41) and (42)). 
Obviously the expressions for are identical. The values of /t 2 in the cases examined in 
the investigation referred to on p. 217 were found to be very close. 

We may now obtain a representation of the distribution of ^ 2 under H x as a Type III having 
the first two moments of (48), that is, assume 0 2 /p as distributed as y 2 with v degrees of 
freedom, where p = j, v - 2/q 2 //i 2 . Clearly this will be a better approximation than that 
of the Type III fitted from the p[, given in (43), and the power function based on this will 
be closer to the exact one than that based on (38) and (39). But, although there is gain in 
accuracy, the simplicity of the approximate method is lost. We may similarly consider 
fitting a Type III distribution, using the true fi[ and fi 2 , but the labour of computation of /t 2 , 
given in (42), appears to be prohibitive. 


5. Conditional power functions 

In §4 we have considered the power function of the y 2 goodness of fit test when the null 
hypothesis is fully specified, i.e. is a simple hypothesis. But often we are interested in testing 
whether an observed sample has come from a certain type of population, so that we are given 



220 


The non-central f- and F-distributions and their applications 

only the form of the population law, not the values of its parameters, say 9 V 0 2 ,..., 6 t . H 0 is 
then a composite hypothesis. Sometimes, also, we have to test the hypothesis that several 
samples are from the same population, without specifying anything about it. In these cases 
we obtain estimates of the unspecified parameters, say T y , ...,T r , from the sample and 
hence calculate the expected cell frequencies m Then, if the method of estimation is efficient'!', 

0 a -S(«i-%) a /“ £ (49) 

is known still to follow approximately a y 2 -distribution with k—r— 1 degrees of freedom. 

Suppose now that as alternative to the composite hypothesis H 0> there is a simple hypo¬ 
thesis H v The question then arises: By estimating the m i on the assumption that H n is true 
and applying the y 2 -test, what chance have we of rejecting H 0 , when, in fact, H± is true? 

Some consideration has been given to this problem, and it seems possible to obtain a 
solution by making use, as a first step, of what David (1947, p. 339) has termed the conditional 
power function. This gives the chance of rejecting H 0 when the test is confined to a restricted 
set, S, of samples which provide the same values, say T[ a \ T| s) ,..., for the estimated para¬ 
meters. Thus, if the process of fitting involves estimating two parameters from the sample 
mean and variance, samples of a set would be those having a common mean and variance. 
Again, in testing for independence in a contingency table, the conditional power function 
would be obtained for a set of samples giving the same marginal totals (see Patnaik, 1948). 
The development of this method will be left for a later communication. 


6. The non-central .^-distribution and approximations to it 

Suppose two independent variates, yj 2 and y|, follow respectively a non-central y 2 -distribu- 
tion with degrees of freedom iq and parameter A and a y 2 -distribution with degrees of freedom 
v 2 . Then the ratio 


F' = 


XlK 


will have the following probability distribution: 

n J Hb'i%+i.WW l "// 


-lO'i+ri)-/ 


]■ 


(50) 


which may be termed the distribution of non-central F or of F'. This corresponds to Fisher’s 
distribution C (1928). Wishart (1932) considered it in the form of the distribution of the 
correlation ratio 

v,F' 

1P = 

v 2 + v x F' 

Later, Tang (1938) derived the same from that of y' 2 . 

If in (50) wo put iq = 1, then it reduces to the distribution of non-central t 2 . Denoting the 
non-central t by , we have 

z + S 


t' =' 


Jw ’ 


where z is a normal deviate with expected value zero and w is an unbiased estimate of its 
variance. Neyman (1935), Neyman & Tokarska (1936) and Johnson & Welch (1939) have 

t h e - gives ti solution not very different from the maximum likelihood or minimum y 2 solutions, 
which are nearly identical in large samples. 



P. B. Patnaik 


221 


dealt with this distribution in detail and studied its various applications. We will not there¬ 
fore consider here in particular this special case of the F'-distribution. 

Taking the general form (50), we may, by analogy, call jq, iq the degrees of freedom and A 
the non-central parameter. It can be seen that when v 2 tends to infinity the distribution of F' 
reduces to that of yq 2 /iq. 

The characteristic function is obtained as an infinite sum of confluent hypergeometric 
functions 


(i) h H ^2 + + 


in which the function, H(a, 6, x), is the sum of the series 


a o(ffl+l) 
+ b + 2\b(b + \) 


a; 2 -)-.... 


Thence we derive the following expressions for the first, four moments about the origin: 

/ _ ^2(^1 +A) 

fl K-2)^’ 

4 


fh = 

/4 = 
/u = 


(n 2 -2)(n a -4) 1 


4 


(>q-2)(iq-4) (n 3 -6)r? 


[fa + A} 2 + 2 (m 1 + 2A)], 

[fa + A) 3 -|- 6fa + A) fa + 2A) + 8(xq + 3A)], 


-4t(^i + A) 4 +12K + A) 2 K + 2A) 


(51) 


fa-2) (jq-4)(iq-6) fa-8)vj 

+ 44( iq + 2A) 2 + 48(iq + 4A) - 32A 2 ], J 

of which the first two were obtained by Wishart by a different method. 

Methods of evaluating the probability integral of the F'-distribution have been worked out 
by Wishart and Tang. They involve a considerable amount of labour. Following the pro¬ 
cedure adopted in the case of y' 2 , it may be possible to obtain a quick, though approximate, 
method by fitting an F-distribution with the exact first two moments of F'. If we regard 
F'jk as following an .^-distribution with v and v 2 degrees of freedom, then, equating the 
expressions for /i[ and /t 2 , we have 

v 2 (x-i + A) v 2 


/cfa - 2) iq iq-2’ 
j[fa + A) 2 +2(iq + 2A)] = 


4 


v + 2 


fc*fa-2)fa-4)vf ,A '' 1 ’ ' v • “vi 1 -vj (jq-2)(>q-4) v 
which give the scale factor and the modified degrees of freedom, viz. 


k = 


Pi + A 


v = 


fa + A) 2 


(52) 


jq -f 2A 

The same result will follow if we approximate the distribution of x'i 2, (the numerator in F' 
by a Type III from the first two moments as in § 3T. 

Using the above approximation, the probability integral 

p^F'WdF' 


j : 



222 


The, non-central y 2 - and F-distributions and their applications 


fF'lk 

is approximately equal to J P v ,„ a (F) dF, 

where k and v are defined in (52). This can be expressed in the form of an Incomplete 
B-function, viz. h> pA 


where 


vF'jk 

~ v^ + vF'/k' 


For given values of iq, p 2 , A and F', we can therefore evaluate the integral from the Tables of 
the. Incomplete B-function (K. Pearson, 1934). When v 2 is even or, if odd, is less than 22, we 
need interpolate only for x and $v( = p). Four-point Lagrangian interpolation jo-wise and 
linear interpolation a;-wise will be necessary. 

Tang’s tables of ^ii (the error of the second kind) (1938) give exact values of the integral 
of the ^-distribution, which, put in the F'-form, is 

J% Vs (F'|A)rfF', (53) 


Table 7. Approximate and exact values of the F'-integral, 


j VuH {F'\\)dF' 




A 

X 

Approx. 

Exact 

3 

10 

4 

3-708 

0-752 

0-745 



4 

6-652 

0-919 

0-918 



16 

3-708 

0-203 

0-200 



16 

6-562 

0-520 

0-517 

3 

20 

4 

3-098 

0-706 

0-700 



4 

4-938 

0-889 

0-887 



16 

3-098 

0-119 

0-126 



18 

4-938 

0-360 

0-347 

5 

10 

6 

3-326 

0-731 

0-731 



6 

5-636 

0-913 

0-914 



24 

3-326 

0-157 

0-158 



24 

5-636 

0-463 

0-461 

5 

20 

6 

2-711 

0-665 

0-664 



6 

4-103 

0-869 

0-870 



24 

2-711 

0-064 

0-069 



24 

4-103 

0-244 

0-245 

8 

10 

9 

3-072 

0-715 

0-714 



9 

5-057 

0-909 

0-908 



36 

3-072 

0-117 

0-119 



36 

5-057 

0-409 

0-408 

8 

30 

9 

2-266 

0-581 

0-578 



9 

3-173 

0-815 

0-813 



36 

2-266 

0-014 

0-017 



36 

3-173 

0-085 

0-088 




P. B. Pathaik 


223 


jP being the ^-percentage point of the P-distribution with r T , v 2 degrees of freedom. Two levels 
of a were chosen for the tables, namely, 0-05 and 0-01, and the range of v x is 0 to 8. The tables 
have to be entered with <f> - a/[A/(p i+ 1)]. Since <j> is at intervals of 0-5, the corresponding 
intervals for A are very wide, which therefore makes interpolation unsatisfactory. 

Table 7 gives the values of the integral (53) calculated by the approximate method indi¬ 
cated above, for certain cases where Tang’s exact values are available. The comparison shows 
that, in general, we have two-figure accuracy, while the error in the third place appears to be 
quite small near the tails.f 

It is to be noted that the table compares the integral at only two points, the 5 and 1 % 
points of the corresponding P-distribution. Due to the lack of exact values it has not been 
possible to judge the closeness at other points. However, some idea of the general accuracy 
could be had by comparing the true and approximate figures for different A’s with the same 
V v r 2 and x{= PJ. 

It can be easily shown (see Hartley, 1948) that the maximum error in the P' -integral due to 
our approximation will not exceed the maximum error in the corresponding ^-integral, 


that is, in 


J*Wx' 2 IW a - 


Table 1 on p. 207 gives an idea of the magnitude of the errors in the y'Mntegral, and so we can 
say that the errors in the P'-integral will not be of a higher order. 

The percentage points of P' can be obtained by interpolation in the P-tables (Merrington 
& Thompson, 1943), for the fractional v and v 2 and multiplying the interpolate by 1c in (52). 

Closer approximations to the probability integral and percentage points may be derived 
by the method based on the Gram-Charlier series, analogous to the second method of § 3-3. 


7. The power function of the analysis of variance tests 


7 • 1. Evaluation of the power function 

The test of a general linear hypothesis may be formulated as follows: Suppose x v x 2 ,...,x N 
be N normal variates with means £ 2 , ...,£ n and the same s-.d., cr . £ { is a linear function of 
s<N parameters, 9 1 , d 2 , Thus 

= a il^l + ct i2$2+ ••• + a ia^i- 

The linear hypothesis specifies, say, r of these parameters, i.e. 

6 X = 61 e ^ e \, ..., d r = 0 ° r . (54) 

It is possible by a suitable transformation of variates (see Tang, 1938) of the form 


to transform 
into 


V) — C jl X 1 + Cj 2 % 2 + ... + C jN X N 


i— 1 

N—s JV-s+r N 

^ 2 = S y}+ S (y j ~v 1 ) i + X (Vi-Vi) 2 

1=1 /=N-*+l j—N—a+r+1 


where r/j in the second sum is a linear function of 6\, 0§,.,., 0J! and i} t in the third sum is a linear 
function of all the 0’s, while the a’s and c’s enter as coefficients. 


t [Further exploration shows that the differences between the approximate and true values are 
systematic, with regular fluctuations. Use is being made of this fact to prepare certain rather more 
extensive tables of the power function. Ed.] 



224 The non-central y 2 - and F-distributions and their applications 


To test the hypothesis (54) we consider the criterion 


[TUn.K- 

1 dr, b r +l’ 

-A) 

n tjPv 

...,d T) .... 

A) 


N-a+r 


= s 

j—N-a+1 



(55) 


If the hypothesis specifies such values for ...,6 r that 9//S in (55) vanish, then the numer¬ 

ator and denominator are the sums of r and N-s central squares respectively. So, the ratio of 
the mean squares follows an B-distribution. On the other hand, if the tj /s do not all vanish, 
we have the ratio of a sum of r non-central squares to the sum of N — s central squares; hence, 
the ratio of the mean squares is distributed as non-central F, the parameter A being Sty 2 which 
can be expressed in terms of 61,..., 0° (see Tang, p. 137). Thus we get the B-test of the analysis 
of variance and obtain the power function of this test with respect to an alternative hypo¬ 
thesis as an B'-integral. 

We shall now consider the question of evaluating the power of the analysis of variance test 
by taking as an illustration the simple case of k groups of observations 

x K (i = t= 1,.... k). 


Suppose x H = A + B t + z ti , (56) 

where A is the general mean, B t the deviation of the mean of the Ith group from the general 
mean so that SB, = 0 and 2 ,/s are random residuals, distributed normally with mean zero and 
s.d. = <r 0 . The expressions for the mean squares between groups and within groups follow 
from the set-up (56): 

• - j=t <*-*■•>* - th k 


v °~ k(n-l) Si Si {Xti ~ Xl ' )2 ~ki^ 


wS S (%-*,.) 2 , 

1 ) (_1 i =1 


where the symbols have the usual meanings, Since (z,.- 2 ..) is a normal deviate with zero 
mean and variance cr 2 /w, we see that v is the sum of k non-central squares subject to the linear 
constraint k 

S (Z/.-z..+ £,).= 0. 

(=i 

Since further SB, = 0, we find from the result of §2-3 that v is distributed as 0 oyj 2 /(fc— 1), 
where yj 2 has {k— 1) degrees of freedom and parameter 


A = wXB ( z /er(j. 

Writing S 2 = (ZBf)lk (57) 

for the variability between the groups, we have 


A = JcnS 2 l<rl (58) 

Now v 0 follows the distribution of <rlxll[H n ~ 1)]> where y§ has lc(n — 1) degrees of freedom. 
Hence vjv Q is distributed as 1/1 

i.e, as F' with jq = k- 1, v 2 = k(n- 1) and A given by (58). 

In this example we desire to test for any possible difference between the averages of the 
groups, so that our null hypothesis is 


B L — B 2 —... — B k — 0. 


(59) 



P. B. Patnaik 


225 


Then, from (57), S 2 and therefore A is zero. Hence r/« 0 follows an /-distribution and we get an 
/-test. Thus the test of the hypothesis in (59) is based on the critical region 


~>F a , (60) 

v a 

where a is the significance level at which we are testing. 

Let us consider an alternative hypothesis that the Bf s a,re not all zero. Then it is known that 
the power function, that is, the probability that (u/u 0 ) > F a , depends only on the single 
parameter $2 



Hsu (1941) has shown that amongst all critical regions of size a, whose power functions depend 
on the single parameter (B 2 /(r|), the critical region of (60) is the most powerful. 

Thus we specify the hypothesis alternative to the null hypothesis (59) by the single para¬ 
meter $ 2 /<7 o i n place of the individual parameters, the B ( ’s. In certain situations, as, for 
instance, in a manufacturing process, we are more interested in detecting the over-all 
variability in a set of machines than in detecting the deviation of each particular machine 
from the general machine average. Then the power function will be useful in measuring the 
chance of detecting this over-all variability by means of the /-test. 

The power function of the analysis of variance tests has been considered by Tang (1938) 
and Hsu (1941). The rather restricted scope of Tang’s tables has already been mentioned in 
§6. The labour involved in computing the exact values of the power is very heavy, and no 
tabling on an extensive scale has so far been found possible. However, with the approxima¬ 
tions to the /'-distribution derived in § 6, we may obtain easily a sufficiently accurate value 
for the power function of the test of any linear hypothesis. 

Returning to the case of k groups and kn observations, wo have the power function given by 

"(JsW'a. 

where F u is the a percentage point of the /-distribution with degrees of freedom v v v 2 . 
Following the procedure of § 6, this integral approximately equals 


I 


F a Vil (i>i+A) 


Pv.4 F ) dF > 


where v — 

Therefore, to this approximation, we have 


K + A) 2 
»> 1 + 2A 


in which 


__ (ig + 2A) r 2 _ 

X ~ (r 1 +2A)r a + (r 1 + A) v^'l 


(61) 


(62) 


7-2. The difference between systematic and random effects 
Next we shall consider two alternatives that arise in practical situations—the random and. 
systematic set-ups (see Daniels, 1939) which may best be described in terms of two examples: 
If the groups in the previous illustration correspond to villages and the observations are the 

Biometrika 36 15 



226 The non-central f- and F-distributions and their applications 

yields of fields in a crop survey, then we can regard the k villages as a random sample from 
a population of villages and the random set-up represented by 

x H = A + y t +z t i ( 63 ) 

becomes relevant. Here, A is the general mean, y t ’s are the group means which are inde¬ 
pendent random variables with expected value zero and s.d. = tr, and z H ’a the random resi¬ 
duals having mean zero and s.d. = cr 0 . _ j _ 

On the other hand, if the groups correspond to k machines which, from the user s stand¬ 
point, constitute the entire population of machines, we cannot regard them as a sample, and 
so the systematic set-up, in (56), considered on p. 224, is relevant. The null hypothesis in the 
random set-up is that the parameter <r 2 = 0, and in the other that S 2 = 0 (which is equivalent 
to (59)). But it is easily seen that both lead to the same F-test for the null hypothesis. 

In applying the test, we are on the look out for the existence of alternative conditions, 
where in one case tr 2 and in the other S 2 is >0. It will be noted that ($ 2 /cr 2 ) of the systematic 
set-up corresponds to (cr 2 /cr§) of the random set-up. Both are measures of relative variability 
between groups and may be termed ‘relative group variability . 

It is possible to relate the power function under the random set-up to that under the 
systematic set-up. If we regard the k groups as a sample from an infinite number of groups, 
then 'LBfKk - 1), i.e. kS 2 j{k -1) will he the sample estimate of the population variance tx 2 . 
Thus treating S 2 as a random variable having a probability distribution denoted by 
p(S 2 l<r 2 ), we can obtain the average power over all the S 2 's. Thus 


P = 


'"^p(S*1aVS* 


gives the power when the random set-up applies. 

This power fi for given (< 7 2 /<Tj) is directly obtained (see Johnson, 1948) from the ^’-integral : 


jJPj(n<r*/crl + 1) ^ dF ]*„( v t +l)/(r 1 +l + A) P " 1 ’ ^ 
where = k- 1, = k{n- 1) and A = kncr 2 /^. 

This can be put in the form of the Incomplete B-function 


(64) 


where 


i y i)> 

_ (iq +1 + A) _ 

(iq+l+AK-f (Vi+l^j-FV 


(65) 


It is interesting to note a result which we believe is true in general and which on intuitional 
grounds might be expected to hold, namely, if the null hypothesis is not true, then for the 
same numerical values of the ratios $ 2 /(r§ and (r 2 j<r 2 , the power of the T’-test is greater in the 
systematic case than in the random, Four particular cases have been examined numerically 
as follows: 



(a) 

(b) 

(c) 

(d) 

Number of groups, k 

4 

wm 

mm 

10 

Number of observations 

6 



11 

in each group, n 


■I 

■H 












P. B. Patnaik 


227 


Values of the power have been calculated, using equations (62) and (65), and are plotted in 
Pig. 2 ( a)-(d ) as ordinates against cr 2 /^ (= iSP/trjj). We find from these that the systematic 
power curve lies above the other; further, we note that the curves are closer to one another 
in (c) and ( d) than in (a) and (6), a fact which agrees with theory that the two power functions 
must tend to each other with increasing 1c. The errors of approximation in calculating the 
power in the systematic case are likely to be small judged by the comparative Table 7 and 
should not affect the relative positions of the power curves. 



0*k\ a‘h] 

(c) fc-12, n-6. (d) k -10, it—11. 

Fig. 2. Power curves for the random and systematic set-ups for k groups with n observations in each: 

-random, -systematic'. 

This relation may be interpreted in a different way. Taking case (a) above, it will be seen 
from Fig. 2 (a) that we can detect, for instance, a ‘systematic’ relative group variability of 
0'45 with a 70% chance, while we cannot, with the same chance, detect a variability of 
magnitude less than 09 in the random case. The difference is of course to be expected. For the 
random set-up, our appreciation of cr 2 is obscured by random variations in both y and z of 
equation (63); for the systematic set-up, our appreciation of S 2 is only obscured by random 
fluctuations in the z of equation (56), 


J5-* 








228 


The non-central yf- an d F-distributions and their applications 


7-3. Applications of the power-function 

We will be concerned here mainly with the systematic set-up and will illustrate the 
application of our results, taking the simple case of k groups and n observations. The treat¬ 
ment is, however, quite general and could be applied to any designed experiment as outlined 
in the general statement given at the beginning of § 7-1. 

Two types of question may be asked in connexion with the test for differences between 
groups: 

(а) What is the extent of departure from the null hypothesis, measured by (S 2 / erg), that 
could be detected with a given chance? 

(б) How many observations are we to take in each group so that we could detect a given 
ratio of between group to within group variability (/S 2 /crg) with a prescribed chance? 

To answer these questions we have to examine the function ft{8 2 j(rl) which may be written 
in the form 


ft(v v iq, A, a) = e~ iK £ 


(W 


j=o j! -®(£iq +j, Jiq) J Kjr./o-,+»*) 


r 

J ViFJi 


K bi-i+}' (i _ a;)4*2-i cLx, from (50), 


and consider its inverse, i.e. A = A(iq,iq, a,/?). Generally, A has to be obtained by inverse 
interpolation from tables of ft such as Tang’s. The interval of tabulation of 0-5 for 

0 = V[ A /K+i)] 


in Tang’s tables is not fine enough for interpolation to be satisfactory. Still, they give a trial 
value of <j> for which /? is calculated and then corrected with the help of the derivative dftjdf. 
Following this rather laborious method, Emma Lehmer (1944) has tabled (f> for a — 0-01, 
0-05 and ft = 0-7, 0-8 and for a wide range of iq and iq. For these two values of the power we 
may use her tables to obtain our A. It would clearly be of value for these tables to be extended. 

We may, however, for any set of values of iq, v 2 , a and ft, get A approximately with the help 
of the approximate form of ft given in (61). Taking a trial value of A we can find two consecu¬ 
tive integers A x , A 2 between which A lies by the following method. From the expression (61) 
for ft we see that A must satisfy the relation 


F f {v, v t ) = ~JfF a (i q, iq), (66) 

where the arguments v, iq and iq, iq are the degrees of freedom. Hence the two integers A! 
and A a would make the right-hand side of (66) just greater and just less than the left-hand 
side. These can be got by trial and error, taking the <x and ft percentage points from the 
F-tables and comparing the two sides, (It is to be noted that v in (66) involves A.) For these 
values of A r and A 2 , ft is then evaluated using (62) and by backward interpolation A is de¬ 
termined. 

To deal with inverse problems, such as ( b ) mentioned above, a graphical representation of 
the relation between iq, iq and A for fixed <x and ft will be most useful. Following the procedure 
described above for finding A, charts have been constructed for a = 0-05 and for two levels of 
power, ft = 0-5 and 09, which are likely to be of practical interest (see Figs. 3 (a), ( b )). The 
charts give, to the approximation involved in (61), contours of equal power and could be used 
for determining any one of the three quantities, iq, iq and A, given the other two. When 
rq = co, the F' reduces to y ,2 /iq, and hence these charts could also be used for answering the 
inverse questions connected with the power function of the y 2 -test (see p. 215). 

We give here two illustrations of the use of these charts. 



P. B. Patnaik 


229 


Illustration 1. To study the seasonal variation in the frequency of occurrence of a particular 
dominant alga in a pond, ten samples of 15 c.c. of water are taken from the pond on the first 
day of each of the five months, April to August. Fifteen drops are taken on slides from each 
sample after shaking it thoroughly, and the number of algae of the particular form are 



Pig. 3o. Contours of equal power for the analysis of variance test with the systematic set-up: 

cx = 0 05, and a power /3(v lt r z , A) = 0-5 


counted under the microscope and the total for the fifteen slides is taken as the density for 
each sample. 

To test whether there is significant variation in the density of this form of algae from month 
to month, the analysis of variance test is applied, say, at 5 % level. It will be of interest to 
know how large should the ratio of the seasonal variability to the variability in the pond be, 
so that we could detect it with a 90 % chance. 



230 The non-central f- and F-distributions and their applications 

Here, v t = k -1 = 4, i> 2 = k(n- 1) = 45. For these, the chart of Fig. 3 ( b) gives A = 16-8, 
from which we find the ratio of between month to within month variability 

S 2 l<rl = A /nlc = 0-34, nearly. 



Fig. 35. Contours of equal power for the analysis of variance test with the systematic set-up: 
a = 0'05, and a power /i(r,, v 2 , A) = 0-9. 


This means that the odds are 9 to 1 on detecting differences at the 5 % level if the s.d. 
of the density of the algae between months was 0-58 of the s.d. within the pond. On the other 
hand, using Fig. 3 (a) we see that there would be a 50:60 chance of detecting differences if the 
s.d. between months is 0-38 of that of a single sample in a month (xS 3 /crg = 0-145). 

Illustration 2. There are seven machines producing copper wire for electric cables. It is 
intended to control the variability in the thickness of the wire due to the machines by taking 









P. B. Patnaik 


231 


samples from time to time and testing for differences between the machines. From previous 
observations we have some idea of the order of variability in the product of a single machine; 
suppose we do not regard the variability between machines as serious if it does not exceed 
O'25 of the within-machine variability. How many samples of wire must we take from each 
machine to have a 90% chance of detecting, at the 5% level for F, a between-machine 
variability of this magnitude, if it exists? 

^ in virtue of (58), we have now to find n satisfying the relation 

A n 


Since - 


n- 


x 0-25. 


Following the contour in chart 3 (6) for = 6, we find by inspection a point on it at which the 
ratio of the co-ordinates is nearly 0-25. This point gives n 2 = 75 from which we obtain the 
number of samples required, n — r 2 //r+l = 75/7 +1 = 12, approximately. On the other 
hand, from 3 (a) we find that we would have a 50% chance of detection, if n - 6. 


In conclusion, I should like to acknowledge gratefully the help and guidance I have 
received from Prof. E. S. Pearson and Dr H. 0. Hartley in the course of my investigations. 


APPENDIX 


Distribution of the sum of squares of independent normal 
variates with different means and variances 

Let Ij, £ 2 ,..., f n be n independent normal variates with expectations b v b 2 , ..., tyand variances 
Uj, v 2 , ■ ■ ■, v n respectively. The characteristic function of the statistic 

r~ s i) (i) 

3=1 

is easily obtained. Introducing x^ = b^/fvp 


we note that each X] follows the probability law 

p{x) = 

and that in (1) can be written as 

f 2 = Styty + cqf, 


V^) ’ 


where cq stands for bfjfv^. (All summations are from j = 1 to n.) 
The characteristic function of is given by 


( 2 ) 


#) = n [^j J_ ro ex P {"«*(*/+°*)* - H) d*y] ■ 

The integral in (3) is equal to j (y^f) 


Hence 


^) = n(l-2i«u,)-‘exp[5^ 


(3) 


(4) 


from which all the moments of the required distribution can be derived. We may represent 
this approximately by a ^-distribution fitted from the first two moments, /i[ = Sty + Stya|, 

and/t 2 = 2Su 2 + 4Si> 2 a 2 . 1 



232 The non-central f- and F-distributions and their applications 

Next we consider the conditional distribution of f 2 in (2) subject to a single linear con¬ 
straint on the x/a, viz. Sc,-(ajy+a 3 ) = p. 

The charactcrististic function of the joint distribution of i// 2 and p is given by 




exp {- \x) + itVj{Xj + c^) 2 + ii l c j [x j +o^)} dx~ j. 


Lv(2^)J 

On performing the integrations in (5) we find 
t x ) = rij^(l- 2itVj)~^ exp 
The conditional characteristic function of i/f 2 , for fixed p (Bartlett, 1938), is 


(5) 


2 itVja) + ZitjCjCtj - 

2(l-2iJt> 3 ) /J' 



The moments of the conditional distribution of ft 2 can then be obtained from (6). 

Again, we may fit a Type III to the conditional distribution of ijr 2 by using the first two 
moments. 


REFERENCES 

Bartlett, M. S, (1938). J. Lond. Math. Soc. 13, 62. 

Cornish, E. A. & Fisher, R. A. (1937). Rev. Inst. hit. Statist. 5, 307. 

Daniels, H. E. (1939). J.R. Statist. Soo. Suppl. 6, 186. 

David, F. N. (1947). Biometrika, 34, 339. 

Fisher, R. A. (1928). Proc. Roy. Soc. A, 121, 664. 

Fisher, R, A, (1931). Introduction to the Brit. Ass, Math. Tables, 1, 20. 

Goldberg, H. & Levine, H. (1946). Ann. Math. Statist. 17, 216. 

Haldane, J. B. S. (1937). Biometrika, 29, 133. 

Hartley, H. 0, (1948). Biometrika, 35, 417. 

Hsu, P. L. (1941). Biometrika, 32, 62, 

Johnson, N. L. (1948). Biometrika, 35, 80. 

Johnson, N. L. & Welch, B. L. (1939). Biometrika, 31, 362. 

Jorgensen, N. R. (1916). Undersogelser over Frequensfiader og Korrclalion. Copenhagen: Busck. 
Lehmer, Emma (1944). Ann. Math. Statist. 15, 388. 

Merrington, Maxine & Thompson, Catherine M. (1943). Biometrika, 33, 73. 

Neyman, J. (1936). J.R. Statist. Soc. Suppl. 2, 131. 

Neyman, J. & Tokarska, B. (1936). J. Amer. Statist. Ass. 31, 318. 

Patnaik, P. B. (1948). Biometrika, 35, 167. 

Pearson, It. (1900). Phil. Mag. 50, 157. 

Pearson, K, (1916). Biometrika, 11,146. 

Pearson, It. (1922). Tables of the Incomplete Gamma Function . London: Biometrika. 

Pearson, It. (1934). Tables of the Incomplete.B-Function. London: Biometrika. 

Tang, P. C. (1938), Statist. Res. Mem. 2, 126. 

Thompson, Catherine M. (1941). Biometrika, 32, 188. 

Wishart, J. (1932). Biometrika, 24, 441. 



[ 233 ] 


MISCELLANEA 

On a method of estimating frequencies 

By D. J. FINNEY 


Haldane (1946) has discussed the estimation of the frequency of an attribute by inverse binomial sampling, 
a method which requires that random sampling be continued until a specified quota of individuals with 
the attribute has been obtained. For example, in a study of an abnormality of blood cells which affects 
a proportion p of red corpuscles, counts might be made until m abnormal cells had been recorded. The 
probability that exactly n cells in all are counted in order to give this quota of abnormals is 




Haldane showed that 


m—1 
n— 1 


is an unbiased estimato of p. He then investigated the variance ofz, but did not give an unbiased estimate 
of this variance as a funotion of x. 

The study of the sampling distribution of x may be assisted by consideration of functions 
defined by 


The average value of U(a,ft) is 


VC.:’)- 

E{V (*,/})} = E w„ U(a.,/3) 


= 

Haldane’s formulae (2) for u 2 , the varianoe of x, may alternatively be derived by expanding in a series 
of 17(2,/O, namely, 

17(2,1) 2! [7(2,2) 3117(2, 3) 


x 1 = 17(2, 0) + 


+ - 


whence 


m m(wi-fl) m(m+l)(nr+2) 
cr 2 = E(a: a )— 


=^[i + 

m 


21? 


+: 


313 * 


m+I (m+l)(m+2) 
s 2 = :c ! - 17(2,0); 


+ 




Define 

it is then apparent that E(e 2 ) = cr 1 . 

This s 2 , an unbiased estimate of the variance of x, differs from the function obtained by substitution of 
r for p in the formula for cr 1 . Indeed, a 2 can be expressed very simply in terms of the sample, since 

-1) (H -2) 


t > = l±iLYJA=. 

\n-\j (n~ 

_ (A-l)jn-A) 
(n— l) 2 (» — 2) ’ 
«(!-») 


l)(n-2) 


This is the most convenient form for computation, but for the planning of a sampling investigation an 
expression in terms of m is more suitable: 

a:*{ 1 — m) 

a 1 =-——. 

m—l—x 

* 3 (1 — x). , 

The exact formula for s 2 shows that Haldane’s approximation-— is slightly biased. His conclusions 

about the choice of a quota in order to give a specified precision for the estimate of p are not materially 
altered. Since g I \ — x 

x v m ~ 1 



234 


Miscellanea 


which is approximately equal to J ^ - ■ whon x is small, an upper limit to the size of the standard error 

relative to x can be fixed in advance of sampling by an appropriate choice of m. 

The standard error is a satisfactory indicator of the error of estimation of p only when m is large. For 
small m, limits of error (analogous to fiducial limits but logically distinct because of the discrete nature 
of the distribution) can be read from Fisher & Yates’s (1948) Table VIII,, by the following rules: 

(i) The lower limit is the lower limit for a direct binomial sample which has m successes in n trials; 
enter the table with a = m,N = n. 

(ii) Tile upper limit is the upper limit for a direct binomial sample which has (m — 1 ) successes in (n — 1) 
trials; enter the table with a = (m— 1), N = (n— 1). 

These limits are the highest and lowest values of p which just fail to be contradicted by the sample, 
in a significance test based upon the chosen level of the probability (Finney, 1947). 

In direct binomial sampling, the total number of individuals bearing the attribute, in a sample of N, 
is usually the only record that is made, Since the practice of inverse sampling involves the collection of 
the data in order, the sample can also provide evidence of whether the condition of independence of suc¬ 
cessive observations is fulfilled. If successive individuals are independent of one another, (m— 1) having 
tho attribute should be distributed at random intervals throughout the first (n — 1) counted; a departure 
from independence, such as would result from a clustering of abnormal cells, will increase the frequency 
of short and of long intervals between these individuals at.the expense of intervals of moderate length. 
A test of significance might be based upon the observed frequency with which abnormals are preceded 
and followed by normals (Wishart & Hirachfeld, 1986; Iyer, 1947), or on some other statistic baaed upon 
the length of intervals. Of course, significant deviation from the value predicted by a hypothesis of 
randomness of intervals, whether resulting from a clustering of abnormals or (a leBS common pheno¬ 
menon) from an exceptional regularity of intervals, would indicate that the standard error and limits 
of error discussed in this note were not applicable. 

REFERENCES 

Finney, D. J. (1947). Errors of estimation in inverse sampling. Nature, Lond., 160, 196. 

Fisher, R. A. & Yates, F. (1948). Statistical Tables for Agricultural, Biological and Medical Research, 
3rd ed. Edinburgh: Oliver and Boyd, 

Haldane, J. B. S, (1946). On a method of estimating frequencies. Biometrika, 33, 222-6. 

Iyer, P. V. K. (1947). Random association of points on a lattice. Nature, Bond., 160, 714. 

Wishart, J. & Hirsohfeld (Hartley), H. O. (1936). A theorem concerning the distribution of joins 
between line segments, J . Lond. Math. Soc. 11, 227-36. 


A further note on the mean deviation from the median 

By K. R. NAIR 

In an earlier (1947) note the author compared the standard errors of unbiased estimates of the standard 
deviation er of a normal population obtained from the ‘mean deviation from mean’, m, and the ‘mean 
deviation from median’, m' in a random sample of size n, when n = 2, 3 and 4. The problem reduced to 
the comparison of coefficients of variation of m and in'. 

(i) When n = 2, both m and m' are identical, and hence their standard errors and coefficients of 
variation. 

(ii) When n = 3, m and m' are not identical. The o.v. of m is 

<ii 

The author was not aware of any exact formula for o.v. of m', but from published numerical tablos 
found it to agree with the right-hand side of (1) to five places of decimals. A recent paper by Jones* 
(1948) enables us to calculate the exact expression for the o.v. of m',whenn = 3. It comes out to be iden¬ 
tical with the left-hand side of (1). 

(iii) When n = 4, Jones has derived expressions for the second moments and the product moments of 
order statistics but not for the first moments. Hence no exact expressions could be derived from his 
results for the o.v. of in', when n = 4. The numerical value the author has calculated from the exact 
distribution of m' in his (1947) note is perhaps tho best approximation known so far. 

* I am indebted to Dr Churchill Eisenhart for drawing my attention to Mr Jones’s paper. 



Miscellanea 


235 


(iv) The exact sampling distribution of m' has been worked out by Godwin (unpublished; see Nair, 
1948), but its probability integral is not very easy to tackle when n> 4. Exact expressions for s.E. of m' 
when n ^ 4 are also not available at present. 

A recent paper by Hastings, Mosteller, Tukey & Winsor (1947) gives tables of means, variances and 
covariances of order statistics for samples of size 2 < n < 10. The covariances are believed to be correct to 
within 1 unit in the second decimal (except for one or two values which may be off by two units). 

■\Vith the help of these tables, the c.v. of m' was calculated for n = 2 to 10 to two places of decimals, 
and are given in Table 1 alongside corresponding values of c.v, of m obtained by substitution in the 
exact formula of the latter. 


Table 1, Coefficient of variation of m and m! 


n 

c.v. of m 

c.v. of w! 

2 

0-76 

0-75 

3 

0-52 

0-52 

4 

0-43 

0*43 

5 

0*37 

0-37 

0 

0-33 

0-34 

7 

0-31 

0-31 

8 

0-28 

0-29 

9 

0-27 

0-26 

10 

0-25 

0-25 


There is an error of 1 unit in the second decimal in the value of c.v. of m' for n = 2, since theoretically 
it should have been the same as c.v, of m, For n = 3 also both c.v. aro theoretically identical and Table 1 
is in agreement with this fact. The c.v. of tn' for n = 4 obtained in the author’s (1947) note was 0-44 com¬ 
pared to tho value 043 given in Table 1. It is believed that the error is within 1 unit intho second decimal 
in the values given in col. (3) for n > 4. 

The closeness of the values of c.v. of m and in' in tho range of sample sizes 2 ^ n ^ 10 reinforces the strong 
practical grounds, namely, greater simplicity in calculation, for using m' rather than m. 


REFERENCES 

Hastings, 0., Mosteller, F,, Tuket, J. W. & Winsor, C. P. (1947). Ann, Math, Statist, 18 , 413. 
Jones, H, L. (1948). Ann. Math. Statist. 19 , 270. 

Nair, K. R. (1947). Biometrika, 34,360. 

Nair, K, R. (1948). Biometrika , 35,118. 




[ 236 ] 


REVIEWS 

Theory of Probability. By Harold Jeffrey's. (Second edition.) Oxford University 
Press. Price 30s. 

The first edition of this book appeared in 1940 and was missed by many (including the reviewer) who 
were on war service. This second edition, according to the author, has had added to it further arguments 
which go far towards establishing the principle of inverse probability; also a theory of invariance has been 
developed and applied to problems of estimation and significance. On reading the book one cannot help 
being sharply aware of the breadth of reading and the width of scientific knowledge which is the 
author’s, but the reviewer, at any rate, remains unrepentantly unconvinced by the subject-matter. 
Prof. Jeffreys’s approach to the theory of probability is too well known to require exposition here. 
Generally, it may be summed up by quoting the idea so often expressed in this book—‘no probability 
is simply a frequency’. This is a perfectly legitimate point of view, even if it is not shared by many 
whose business in life is the provision of an objective criterion in the shape of a probability figure which 
is to serve as a guide for future action on the part of others. In practice, the writer suspects, Prof, 
Jeffreys would calculate a probability in the same way as other persons who follow (say) the frequency 
approach to the subject, but in theory it would be possible for two persons following Jeffreys to obtain 
different values for the same probability. For the author writes: ‘Our main postulates are the existence 
of unique reasonable degrees of belief, which can be put in a definite order.’ It does not seem possible 
so to standardize degrees of belief that the order will be the same for each and every person, and it 
would appear therefore that the uniqueness can really only apply to the single person. Actually the 
calculation and interpretation of a probability is a compound process. By some objective standards 
a numerical figure is reached, and should be reached, by a logical process which in its application leads 
all persons to the same conclusion. It is in the interpretation of this numerical figure that the subjective 
factor will enter, the final judgement being the result of a complexity of impressions in the interpreter’s 
mind, varying from his psychological make-up to the consequences involved by the taking of a wrong 
decision. For each person this interpretation must inevitably be different. 

Prof. Jeffreys writes at length both on theories of testing hypotheses and of estimation. His ideas 
on the Neyman-Pearson concept of a power-function seem a little hazy, but much of what he has to 
say is both pertinent and interesting. It is, however, a little disconcerting to be told by the foremost 
exponent of inverse probability and the applications of Bayes’s theorem that there is general agreement 
between himself and Prof. R. A. Fisher. Many of us first disentangled ourselves from Bayes with the 
help of R. A. Fisher’s papers on inverse probability—I think in particular of ‘Uncertain Inference’ 
(Proc. Amur. Acad. Arts Sci. 1930)—and are inclined to find relevant the remark and the quotation 
made by Coolidge: ‘We use Bayes’s formula with a sigh, as the only thing available under the circum¬ 
stances: “Steyning tulc him for the reason the thief tuk the hot stove—bekaze there was nothing else 
that season.”’ But it may be that until the statistical theory of estimation is placed on more secure 
foundations, the battle of Bayes will have to be fought by each generation. 

This book is not suitable for students starting to read probability and statistics. It may profitably 
be read by those of some maturity, for Prof. Jeffieys's ideas and examples will materially help in the 
interpretation of statistical theory no matter what school of probability the reader favours. 

F. N. DAVID 


Karl Pearson's Early Statistical Papers. 557 pp. Cambridge University Press (for the 
Biometrika Trustees). 1948. Price 21s. 

This volume should find a place on the shelves of every statistical library. It includes reprints of eleven 
of Karl Pearson’s early memoirs, from 1894 to 1916, memoirs of which reprints have for many years 
been unobtainable. They have been reproduced photographically by the litho-offset process, so that 
the reprints can be trusted beyond question as facsimiles of the originals, without any of the doubts to 
which resetting might occasionally give rise. Those reprinted from the Philosophical Transactions and 
the Drapers' Company Research Memoirs have been reproduced, apparently, with a slight reduction in 
size which makes no appreciable difference in legibility, while the page of the ^'-memoir reprinted from 



Reviews 


237 


the Philosophical Magazine has been enlarged sufficiently to give an increase of legibility very pleasant 
to aged sight. Messrs Bradford and Dickens can be congratulated on .the excellence of the printer’s 
work, and the price of the volume is astonishingly low for a quarto volume of over 550 pages. The one 
disadvantage of the process used, which necessitates a coated paper, is that the volume is very heavy 
( 4 j. it,.), but this is hardly a serious fault in a volume not intended for idle hours. In effeot the papers 
cover a period of eleven years only, 1894—1905, for the isolated paper of 1916 is the Second Supplement 
to the Memoir on Skew Variation of 1895. 

One who took part in the work associated with several of these early paperB may be forgiven for 
indulging in affectionate recollections in a brief survey. The first memoir reprinted here is inevitably the 
rather insufficiently entitled ‘Contribution to the Mathematical Theory of Evolution’, which actually 
deals with the problem of dissecting a distribution assumed to be compounded of two normal curves. 
The memoir is of historical interest, apart from its special problem, as the method of moments is used for 
fitting and the term ‘standard deviation’ is introduced, I think, for the first time in print. Weldon’s 
measurements on crabs were used for illustration in the original memoir, but much fuller illustration, 
on skull measurements, was given in the Phil. Mag. paper of 1901 (6th series, vol. 1, pp. 110-24). 
I well remember the work of calculating moments up to the fifth for this memoir or for work subsequent 
thereto, and the solution of the nonic. 

The second memoir reprinted is the well-known initial memoir (1895) on Pearson’s family of skew 
curves, ‘Skew Variation in Homogeneous Material’, which was supplemented later by two further 
memoirs, published in 1901 and 1910, dealing with certain subtypes which had escaped attention. Each 
of the three memoirs is characteristically and very fully illustrated by examples, so fully that it is hardly 
too much to say that these memoirs created a revolution in the general view as to the characteristics 
of frequency distributions. Previously, the ‘normal distribution’, as implied by its name, had been 
regarded as the common type and other forms as somewhat odd and aberrant types. Henceforward, 
skew distributions of various degrees of skewness had to be recognized as the common forms, and the 
normal distribution as a highly exceptional limiting type. I recall working on some of the examples 
in the memoir of 1895, and doing some of the drawings of theoretical typos, in the course of which the 
approximate relation between mean, median and mode was discovered, the median being determined 
by the use of an Amsler planimeter on the graph. A good deal of the preliminary work in this papor, on 
formulae for ‘correcting’ moments, has been superseded, but the memoir well illustrates the flexibility 
of the method. 

After these two studies of variation in a single variate, studies in bivariate and multivariate variation 
naturally follow and the third reprint (Evolution series, in, 1896) deals with that subject, but—and 
this is surely quite characteristic—not in purely symbolic and mathematical terms as a general method, 
but entirely with reference to the biological field, under the title ‘ Regression, Heredity and Panmixia ’. 
The procedure may have some advantage in illustrating the application of the statistical method to 
real problems, but it certainly has the very real disadvantage of obscuring what has been discovered in 
general method. In this paper Pearson assumes the normal law and uses the form given by Bravais, 
but introduces a single symbol r to replace the quantity S(xy)/(ncr l cr 1 ), and shows that this expression for 
r makes the observed result the most probable, and hence it is the ‘best’ value for the coefficient, and 
preferable to the method of determination used by Galton and Weldon. There is a short section on the 
probable error of the coefficient of correlation so determined, leading to a result which was recognized 
as erroneous in the subsequent memoir (IV) by Pearson and Filon. The mistaken value was actually 
the s.e. of r for assigned errors in oq and tr 2 . The fifth reprint, ‘ On the Reconstruction of the Stature of 
Prehistoric Races ’, also carries somewhat further the theory of correlation, but the story of its develop¬ 
ment is an astonishingly tangled one. 

The ground having been cleared by the discussion of the theory of frequency curves and the theory 
of correlation, problems of sampling had become of much more importance, and the fourth memoir of 
the ‘Evolution’ series, ‘On the probable errors of Frequency Constants and on the Influence of Random 
Selection on Variation and Correlation’ (1898) is the fourth memoir selected for reprinting. It was 
written, it will be remembered, in conjunction with L. N. G. Filon, who was then Pearson’s Demon¬ 
strator. The sixth reprint is that of the classical paper on the method of testing ‘Goodness of Fit’ 
[Phil. Mag. 1900), to which a list of the papers on the same subject that followed it would have been 
a useful appendix. 

The eighth reprint is that of the Philos. Trans, memoir 1 On the Mathematical Theory of Errors of 
Judgment with Special Reference to the Personal Equation’ (1902), a paper which still seems to me 
one of considerable interest and importance. The paper was based on two series of experiments, the 
first carried out in 1896 by Karl Pearson, Alice Lee and myself on bisecting segments of straight lines 
by estimation, the second, in which Karl Pearson, Alice Lee and W. R. Macdonell were the observers, 
on estimating the position of a bright line on a strip. The first series seems to me particularly interesting 



238 Reviews 

as showing the number of useful warnings that can be drawn from experiments, of the simplest type 
requiring no special apparatus. 

With the work of the ninth and tenth reprints ‘On the Theory of Contingency’ (1004) and ‘On the 
general Theory of Skew Correlation’—the memoir in which Pearson describes the correlation ratio and 
its uses—I had nothing to do, and may content myself with the bare mention of them, though both are 
of importance in the history of statistical method. Enough has been said to show the high value of the 
volume, which is a real ‘treasury’ of Pearson’s work. 


6. umsrsr yuie 




in 









:- r > journalofthe 

AMERICAN STATISTICAL/ASSOCIATION.' 

' .-:-.N Namber 247 

V';-r 4^^'C -V '4.':';- 

■■;^V "y y.^ : 0 ' : .. ;H';■ y;?;•:;'-Ci 

TbnWmrfw Caik>Ifetbnd C, ; '•< ; SteBC^MBmm* 

A pflimrim.. rf 8m»m ffiyiiflumiut Tmta far the Median which me VaBd vain Vwy GgnaMt CundHlong 

"i ■*■'"■« • •'•, '. - : ,, »‘v .'.7... “.*v- : >.J obwRWmm 


AMERICAN ASSOCIATION 

160a K Street, N.W n Washington 6, D.C. 


:; .;v:-;:y ;v;ACCEPTANCE: SAMPLING 

Papers delivered at the Annual Meeting o£ the American Statistical Association 

Prompt 


orders will he appreciated, ;•*'? 

ACCEPTANCE SAMPLING BY ATTRIBUTES 

Dmtopimm 

Wartime Dotahpmmte > , 

ACCEPTANCE SAMPLING BY VARIABLES 

Aabptmai Sampling fy Variables, with Special Rtfarenae to the Casa in 


which Quality ii Measured by Average or Dispersion 


AMERICAN STATISTICAL ASSOCIATION 




















Volume XXXVI, Parts III and IV 


December 1949 


THE ESTIMATION OP THE PARAMETERS OP 
TOLERANCE DISTRIBUTIONS 

By D. J. FINNEY 

Lecturer in the Design and Analysis of Scientific Experiment, University of Oxford 

1. Introduction 

In many types of investigation concerning tlie distribution of tolerances in a population, 
direct measurement of tolerance for each individual in a sample is impossible; the only 
practicable procedure may be to subject individuals to specified ‘doses’ of the stimulus 
under test and to reoord whether or not they show a particular response. For example, in 
the evaluation of the toxic properties of insecticides, a different dose of an insecticide may 
be given to each of several batches of insects and the numbers of deaths and survivals 
recorded. Insects killed by a certain dose are known to have a tolerance lower than that dose, 
insects surviving are known to have a higher tolerance, but only the one test can be made 
on each insect and the parameters of the distribution of tolerance can therefore be estimated 
only by comparison of the proportions of responses recorded for different doses. Other 
examples of data of this kind have been given elsewhere (Finney, 1947 a). 

A method of analysis based upon the probit transformation is now commonly used to 
assist the interpretation of quantal response data of the kind just described. The writer has 
published (1947 a) a detailed account of the standard probit technique for estimating the 
parameters of a normal distribution of tolerance. The purpose of the present paper is to 
derive and'illustrate a general method, applicable to tolerance distributions other than the 
normal, which includes various modifications in the specification of the problem such as 
those already discussed by Finney (1944, 19476) and Wadley (1949). 

2. The equivalent deviate transformation 

Suppose that each member of a population ha,s a tolerance, u, in respect of some stimulus, 
and that the distribution of u depends only upon a parameter of location, ji, and a para¬ 
meter of scale, cr. The distribution may then be written 



where/(v) is a function containing no unknown parameters and 

r /w*-i- w 

J —OO 

Either or both of the tails of the distribution may have finite limits, but for convenience 
these oases may be regarded as contained within the infinite limits without affecting the 
general theory; the limits of the distribution, of course, are assumed to be independent of 
the parameters. The probability that a stimulus whose measure on the same scale as u is 
x will produce the characteristic response in an individual chosen at random is 



x6 


Biometrika 36 



240 


Estimation of the parameters of tolerance distributions 


For example, in a population of insects whose tolerance of a certain insecticide is distributed 
normally, the probability that a dose x would lull a particular insect is 


P = 


-exp - 


f («-/0 


2 o' 2 


■ du. 


I-ccV(2t t)<t ‘ L 

The units for u and x need not be those in which the experimenter habitually measures, and 
often a dose metameter, such as the logarithm of the concentration of insecticide, is required 
in order that the distribution shall take the right form. 

Write now u-p 


v = 


a 


= a + /?w say, 


where cc = —pier , f = 1 jar. 

Then the general expression for P may be written 




tt+fix 


f[v)dv. 


(*) 

(5) 

( 6 ) 


For any specified function/(w), a quantity Y, the equivalent deviate of P, may be defined as 
a monotonic increasing function of P by the equation 


P = f f(v)dv. 
J — oo 

In particular, for a normal distribution of tolerances 


(?) 


and Y is the normal equivalent deviate of P, differing from the probit of P only by omission 
of the conventional addition of 5. Equations (6) and (7) then show that the relationship 
between the probability of response and the measure of the stimulus, which is equivalent 
to a statement of the tolerance distribution, may be expressed by 

Y — a + fix. (8) 

Consequently, the procedure for estimating the parameters pi and cr may be regarded as 
finding a linear relationship between x and the Y -transform of the probability of response; 
this probability is itself estimated from experiments by p, the proportion of individuals 
responding to a particular dose, and equations (5) enable estimates of p and cr to be obtained 
from a and ft. When the proportion responding to a particular dose can be directly observed, 
in an experiment in which a known number of individuals are given the dose and those 
responding to it are counted, the sequence of calculations for solving the maximum likelihood 
estimation equations is the analogue of the standard probit calculations corresponding to 
the form of tolerance distribution adopted. The theory of this, with examples of its conse¬ 
quences for several tolerance distributions, has been given elsewhere (Finney, 1947 a, b). 


3 . The estimation on the farametebs 

Experimentation leading to direct observation of values of p for given values of x is not always 
possible. For example, the population may contain an unknown proportion of individuals 
who will respond whatever dose is given, or the total number of individuals receiving a 
particular dose may have to be estimated from a parallel sample instead of being counted. 
Methods of analysis for these two problems have been described (Finney, 1944; Wadley, 



D. J. Finney 


241 


1949), and their similarity suggests that they are particular instances of some more general 
theorem, Such a theorem will now be derived. 

Suppose.that the tolerance distribution for individuals is that given above as (1), so that 
the probability of response at dose x due to the action of that dose is P (- 1 - Q , say). Suppose 
further that an experiment is performed in which r, the number of individuals responding, 
can be observed for a series of values of the dose, x, a,nd that, under the conditions of experi¬ 
ment, the probability of exactly r responses at a particular x is 

P(r) = F(J + KQ)i (9) 

here J and K are additional parameters which in general will also require estimation, and 
F{ ) is a known function. For example, if a proportion C\ of the population will respond 
even to zero dose and a proportion G % is unable to respond however large the dose, then the 
probability that an individual selected at random will respond when it receives a dose x is 

P* = (l-C z )-Q(l-C 1 -C z ). (10) 

If n individuals are subjected to a dose x, the number of responses will have a binomial 
distribution based upon this probability; 

P{r) = (^jP* r Q* n - r , (11) 


which is of the form of equation (9). Unless C\ and C 2 are known, direct estimation of P from 
tests at a particular x is impossible, and the whole data must be used in order to estimate also 
the additional parameters. If 0 2 is known to be zero, for a normal distribution of tolerances 
this reduces to the problem of taking account of 'natural mortality’ in probit analysis 
(Finney, 1944, 1947a, Chapter 6). 

The logarithm of the likelihood of the results obtained for a series of doses may be written 

L = $ log .P(r) = jSlogF, (12) 

where the summation denoted by 3 may include doses for which x-+— oo, for which Q is 
known to be unity, or x + oo, for which Q is known to be zero. If both J and K are unknown, 
the maximum likelihood estimates of a, /?, J, I( will be obtained by equating to zero the 
partial differential coefficients of L with respect to each of these; if, as sometimes happens, 
the conditions of the problem specify J or K exactly, the corresponding equation will be 
omitted. 


Now 


3 Q 

dec 


= -/(«) = - -Z. say, and ^ = -xZ, 


(13) 


where Z is the ordinate of the standardized tolerance distribution, f(v) dv. Consequently 
the maximum likelihood equations may be written 


n Si r,aZF' n 9 L F' ' 
0 = fa=- KS lT’ ° = 9 J = S F’ 

dL_ xZF' Q _dL_ QF 
dK~ F ’ 


(14) 


where 


f'w = 


(16) 


Equations (14) can be solved most easily by an iterative process which calculates adjust¬ 
ments to provisional values of a, yd, J, K. The adjustments may be calculated from Taylor- 
Maclaurin expansions to the first order, in the usual manner (cf. Finney, 1947 a, Appendix ii> 



242 Estimation of the parameters of tolerance distributions 

19476). If J, K are approximations to the estimates of these parameters obtained at the end 
of one cycle of calculations, an empirical rate of response to one of the doses tested may 
be defined as p (= 1 - q), the maximum likelihood estimate of P from the data for that dose 
alone when J and K have these values. Thus q is the solution of 

F'{J + Kq) = 0. (16) 


Hence, to the first order in ( Q-q ), 

F'(J + KQ) = K(Q-q)F"(J + KQ). (17) 

The second differential coefficients of L, after substitution of the expected values of all 


parameters, are 


02 L z 2 F" 

_if - K hS—— 
da 2 F ! 

02i - j m^ F " 


dad (I 

dfL 

3/? 2 

d*L 
da dJ 

' d*L 


= IPS 


-KS- 


= -ICS 



dp j f ’ 

Introduce now a weight, W, defined by 

W = 


9 *L_ Ff 
dJ 2 * F ’ 


d *l 

dad II 

PL 
dpdK ; 

3 *L 
BJdK 

3 2 L 

dIP 


-KS 


-K8 


QZF" 
F ’ 

xQZF" 


F 


S 


= S 


QF" 

F ’ 

QfF" 
F ' 


IPZ 2 F" 
F ’ 


(18) 


(19) 


and two auxiliary variates, t v < 2 , defined by 

h- KZ > h- Kz - 


( 20 ) 


If a, /?, J, If are approximations to the maximum likehhood estimates, obtained by a 
graphical method, by rough calculation, or from a previous cycle of the calculations about 
to be described, the quantities defined by equations (19) and (20) may be formed for each 
dose, with the aid of equations (6), (9) and (13). Adjustments to the estimates, Set, 8ft, SJ, SK 
are then the solution of a set of linear equations whose coefficients are given by equations 
(18); these equations may be written;f 

toSW+tySWz+SJSWtL+dKSWti = Sw[~~^ 



Q-q \ 

Z J’ 

$aSWt 1 +80SWxt 1 + 8JSWtl + $KSWt 1 t i = SWtJQzi), 


SaSWx + SfSWx* + 8JSWxt x + SKSWxL, = SWx | 


( 21 ) 


SaSWtz + SfSWxts + SJSWt^+SKSWtl = SWt 2 




t It the approximations to J and K differ much from the maximum likelihood estimates, some 
values of j) may be negative or greater than unity. Since p, q are introduced only as aids to the solution 
of the maximum likelihood equations, these values are nevertheless to be used in all that follows. 



D. J. Finney 


243 


The working equivalent deviate (analogous to the working probit), y, must now be intro¬ 
duced. It is defined as „ _ 

y=Y + ^, ( 22 ) 

where Y,Q,Z,q are determined from the approximations to the estimates of the parameters 
by means of equations (8), (7), (13) and (16). The introduction of the symbol 6 for the weighted 
mean of a variate 0, 5 = sWdjSW, (23) 


and 8ba\ f' or the weighted sum of products of deviations of any two variates ft, ([>, 

= 8W6<j>-(SW6)(SW(l>)ISW, (24) 

enables the equations, after some rearrangement, to be reduced to 


and 


a 1 = y—fixX—§Ji x — SKt^, 

M^+sjs^+sks^ = s xv ; 

M xk + m ih + 8KS tih = S liy> 
PA s +SJS tih + 8KS ¥i = S tiV .. 


( 26 ) 

(26) 


Here a v have been written for the adjusted values (a + doc), (/? + S/3), but in general this is 
unnecessary as there is little danger of confusion. Equations (26) show that /? 1 , SJ, SK are 
the weighted partial regression coefficients of y on x, t x , t 2 respectively. 

Providing that no doses are included for which P is known absolutely, instead of merely 
estimated, equations (26) are complete as shown. Often, however, ‘control’ batches of 
subjects are tested in the absence of the stimulus. If the dose scale is a simple measure of 
weight or concentration, this will correspond to x = 0 in a tolerance distribution whose tail 
does not extend below x = 0; if a logarithmic dose metameter is being used, the control 
batches have x -> — 00 . Eo# these controls, Q = 1 is known, and therefore they make no 
contribution to information on a and /?; it does not follow that q, derived from equation (16), 
is unity, and the controls do. give information on J and K. Examination of the method of 
derivation of equations (26) shows that ( — F"jF), evaluated for Q = l,.must then be added 
to jS(,, jS ( ( and S titl , and that K(l - q) F"jF must be added to S (lV and S tlV . The other coeffi¬ 
cients of the equations remain unchanged. Less commonly, a maximal dose (ck-^ + oo) is 
given, for which the conditions of the problem require Q = 0 exactly. From equation (9), 
it is clear that the subjects so tested can give information only on J, which is therefore in 
some way a measure of that part of the population of subjects immune to the stimulus. 
Only two coefficients of equations (26) are affected: ( — F"jF) must be added to S lltl and 
I£(l—q)F''IF must be added to S tlV . 

After the solution of the equations, a second cycle of calculation may be performed, using 
a i> fiv (= J + SJ), K x { = K + SK) in place of a, /?, J, II, and iteration may continue as long 
as seems desirable. Of course, in the limit SJ, SK become zero and /? becomes simply the 
weighted regression coefficient of y on x, but for practical purposes the iteration need seldom 
be carried to the stage at which the adjustments to J and K are negligible. 

If k doses in all have been tested (including any for which — co or :r-> + oo), a y 2 with 
(k- 4) degrees of freedom will test whether the observations agree satisfactorily with 
expectations based upon the parameters. This y 2 can be calculated by comparison of observed 



244 Estimation of the parameters of tolerance distributions 

and expected frequencies. A simpler calculation, correct 'when the maximum likelihood 

estimates axe used throughout, is 2 _ c R q 

AUi-41 - Uyy ~ P 0 x y , ( 47 ) 

and even when the iteration is not carried to the limit a satisfactory approximation will 

usually be given by ^ = S vv ~j3S xv -SJS kv -SKS hv . (28) 

The x 2 calculated by either of these formulae corresponds to the usual expression 

a (observed-expected) 2 

^ expected 


taken over all classes, and consequently does not allow for the undue influence of very small 
expected frequencies in some classes; this point has been discussed elsewhere (Finney, 
1947 a ), and the same method of dealing with it (calculation of each expectation and grouping 
of classes) is required here. Equation (28), however, is often good enough and saves the more 
laborious calculations previously used (Finney, 1944; Wadley, 1949), except when borderline 
values indicate the need for more detailed examination. 

For data showing no heterogeneity (i.e. a non-significant x 2 ), the variances of the estimates 
of the parameters may be found in the usual manner. In fact 

V{y) = l/SW ' (29) 


and the variances and covariances of /?, J, K are the elements of the inverse matrix 


'S'* 

s* 

s+y* 


S xk 

** 

\h > 

(30) 



S ‘J 



V will generally be calculated as a stage in the solution of equations (26). The mean y is 
uncorrelated with jl, J , K, and the variances of a and of other combinations of the para¬ 
meters can readily be derived. If there is evidence of heterogeneity, but not of a land that 
makes the form of the tolerance distribution or some other characteristic of the analysis 
inappropriate, all variances must be multiplied by the heterogeneity factor; no new discus¬ 
sion of this point is needed (Finney, 1947 a, §§ 18, 19). 

If the conditions of the problem specify J or IC, SJ or SK will be everywhere zero, the 
second or third equation of (26) will be omitted, and y 2 will have (k — 3) degrees of freedom. 
If both J and K are specified, only the first of equations (26) remains, y a has (k — 2) degrees 
of freedom, and the calculations are those for the standard probit method, or for the corre¬ 
sponding method based on some non-normal tolerance distribution, just as described in 
a previous paper (Finney, 1947 6). 


4. Applications and illustrations 
(i) Adjustment for natural mortality 

In the testing of insecticides, 1 control ’ or untreated batches of insects frequently show some 
deaths during the time that elapses between treatment and classification of the treated 
batches. It may then be assumed that some of the deaths amongst treated insects would have 
occurred even if the insects have been untreated; they are due to natural causes and not to the 
insecticide. If an insect has a probability G of natural death, independently of the insecticide, 



D. J. Finney 


245 


a dose of insecticide, sufficient to give a probability of death P amongst insects not dying 
from natural causes, will have associated with it a total death-rate obtained as the particular 
case of equation (10) with C 2 = 0: p* _ ^ _ q^ 

This assumes that the two causes of death operate independently of one another.f 
If n insects all receive the same dose, and react to it independently of one another, the 
probability that r of them die is 

P(r) = P* r (l - P*)™ = P{i _ Q( i _ G)}, (32) 


where, in the notation of § 3, J = 1, K = — (1 - G). 


(33) 


Since J is specified by the conditions of the problem, the second equation of (26) is not 
required. Now logP = const.+ rlogP* +(re — r)log(l-P*). 


Differentiation twice with respect to the argument of F( ), followed by the replacement of r, 
(n-r) by their expectations nP*, n( 1 — P*) gives the expected value 

F" n 

F ~ P*(l-P*y 

whence, using equations (19) and (31), 


nZ\l-G) 

Q{1-Q{1 -0)}‘ 


(34) 


This may be regarded as equivalent to a weighting coefficient, or weight per individual, w, 
where ™ 




to be multiplied by n in order to give the to'tal weight for the batch. When G = 0, w reduces 
to the familiar formula for probit analysis; the present result, however, applies to tolerance 
distributions other than the normal providing that the appropriate Q and Z are used. 

Direct estimation of 0 rather than of (1 — 0) is convenient, and for that purpose the auxi¬ 
liary variate t 2 of equation (20) is modified to 

t = QIZ-, (36) 

<!, of course, is not required since J is known. The factor 1 jK is included in the auxiliary 
variate for theoretical work, but in computation it is most easily left as a multiplier of d J, 
SK, of which account may be taken later. Equations (26) nOw become 

ftG ftG 

ps„+j°^S xl = S xy , ^+ 1^4 = 4 . (”) 


and from equation (25), 


a = y—ffi c — 


SO 

1-0 


t. 


(38)' 


t Horsfall (1945) has strongly criticized this assumption and any attempts to base methods of 
statistical analysis upon it. Extensive experimental studies will be needed before discussion of this 
matter is taken further; in the absence of these, equation (31) is a reasonable mathematical model to 
try, and experience has proved it to be adequate for the explanation of many sets of data. 



246 Estimation of the parameters of tolerance distributions 

Thus f and 8C/(1 - G) are the partial regression coefficients of y on x and t, and when they 
have been found the revised estimate of the remaining parameter, a, is given by equation 
(38). The 8 XX , etc., are weighted sums of squares and products of deviations, the weight for 
any dose being nw. If some insects, say n c in all, have been included in the experiment hut 
kept as untreated controls in order to provide a direct estimate of C, contributions from 
these will affect S tt and S lv . Writing r c as the number of responses amongst these controls and 

c = rjn c (39) 


as the estimate of C based on the controls alone, the comments below equations (26) show 
that S u must be increased by 

'//. i i — lit 

(40) 


n c (l-0) 


G 


Since the value of q for the controls is 
to give 


obtained from equation (31) by substitution of P* 
1-c 

q ~ T -<r 


= c 


S iy must be increased by 


n c {c-G) 

G 


(41) 


The calculations based on equations (37)—(41) are equivalent to those previously given 
(Finney, 1944, 1947 a); the latter, however, were developed only for a normal distribution 
of tolerances, but the present theory shows the same form of analysis to be applicable to any 
distribution providing that the appropriate relationship between Y and P (equation (7)) 
is employed. The method of calculation now recommended is simpler than the earlier version, 
especially as most of it follows the standard pattern for multiple regression. 

The provisional G used in any cycle of calculations may by chance be greater than some 
of the p* for low doses; for each of these, equation (16) will then give q greater than unity, 
p negative. The corresponding P, being an expected value, is of course positive, and applica¬ 
tion of the formula for theVorking equivalent deviate (equation (22)) will enable the results 
for these doses to exert their proper influence in the calculations even though diagrammatic 
representation in the usual probit or other regression diagram is impossible. 

For a normal distribution of tolerances, a and cr are respectively the mean and standard 
deviation. Values of the auxiliary variate, t, defined by equation (36), and of the weighting 
coefficient, w, from equation (36), have been tabulated by Finney (1947a, Table II). As an 
illustration of the calculations, the example given in Chapter 6 of that book may be analysed 
again; this is of interest as showing how the natural mortality adjustment can be employed 
in data for a biological assay, in which two insecticides or other preparations are tested 
simultaneously. 

Preparations of two derris roots, W. 213 and W. 214 were tested on the grain beetle, 
Oryzaepliilus Surinam,ensis. The log concentrations, in mg. dry root per litre, are shown as 
x in the first column of Table 1. The columns, n, r, p* show numbers of insects tested, numbers 
affected (dead+moribund + slightly affected), and percentages affected respectively. In 
a control batch of 129 insects, 21 were affected, so that 


c = 21/129=16-3%. 

Inspection of all the data suggested C = 17-0 % 



D. J. Finney 247 

as a first approximation to the required estimate; values of p were then calculated from 
equation (31) which may be rewritten 

p*- 0-170 
P ~ 1-0-170' 

In accordance with usual practice, these have been tabulated as percentages, instead of 
proportions, though in all formulae p is used as a proportion. 

The empirical probits of p (Finney, 1947 a , Table I) were next tabulated, and, either by 
inspection of Table 1 or by plotting these probits against x, a column of expected probits, 
Y, was obtained from two parallel regression fines. The weighting coefficients, w, and values 
of £for each Y were read from tables (Finney, 1947 a, Table II), hi order to give the next 
two columns in Table 1; for example, for Y = 7-6 and G = 17 %, w = 0-03298 and multi¬ 
plication by 142 gives 4-7 as the total weight for the first bat oh of beetles. The working probits, 
y, were also read from tables (Finney, 1947 a, Table IV). 

Calculation and summation of columns for nwx, nwt, nwy gave totals from which x, t, y 
were formed. Further multiplications by x, b, y led to sums of squares and products of devia¬ 
tions, found according to equation (24). All these were formed separately for the two roots. 
In order to form single estimates of /? and 0 , corresponding sums of squares and products 
were added, together with the special contributions to S u and S tll from the controls obtained 
from formulae (39)~(41); details are shown at the bottom of Table 1. Equations (37) then 
take the form 

28-82957? — 76-5485 = 80-709, -76-5485^+897-1128-^ =-214-659. 

I — O i — G 

The inverse matrix of coefficients (equation (30)) is easily found as 

_ /0-0448475 0-0038267\ 

V ~ \0-0038267 0-0014412/ ‘ ( ' 

Therefore /? = 80-709 x 0-0448475 - 214-659 x 0-0038267 

= 2-7982, 
s>n 

and —- = 80-709 x 0-0038267 - 214-659 x 0-0014412 

1 — o 

= -0-00052. 

Hence, by substitution of the provisional value of O, 

SC = -0-00052x0-83 
= -0-00043. 

From equation (38), and Table 1, the remaining parameters for the regression lines are 
a L = 5-5980 —2-7982 x 1-4417 + 0-0005 x 1-1325 
= 1-564 

and a 2 = 5-8841 - 2-7982 x 1-3017 + 0-0005 x 1-3233 


= 2-242. 



Table 1. Toxicity of derris roots 111213 and HI 214 to Oryzaephilus surinaxnensis 

(p* and p are shown as percentages in the table, but in formulae are used as proportions) 


248 


Estimation of the, parameters of tolerance distributions 


5r> 

1 

37-318 

70-907 

215-250 

218-975 

542-450 

72-105 

104-747 

192-168 

129-156 

68-464 

566-640 

- 

i 

1-6121 

3-8024 

19-7750 

84-5500 

id o cq cd ^ eq 

rn T}i o ^ O CO 

co eq cq cq co © 

t> ia os p cq 

Ci CO CD CO O O 

o rn cq CO 

rH 

00 

CO | 

't 1 

cq 

rH 

1 

10-199 

19-400 

58-800 

51-300 

139-699 

17-005 

24-734 

46-786 

26-793 

10-032 

O 

\A , 

CO 

1A 

(N 

rH 


tk h io h 05 co oq C5 , 

® eo rH © io O H o CO 

i> i> cb -4* i> t> cb \b cb 


CO Cq IA © Cl C« C5 CD I> 

i(C3) © cD <35 CO CO t-» Ift 1 

MCO io h CO hH ift CO ift 

OOOrH OOOOtH 

' 1 

t> I> O 10 

CO ^ ® i£ Si 
rH CO Hr 

05 1A 05 05 CO 

CD ^ 03 ^ H C3 1> 

05 rH H CO (N H 

cq 

W 1 

» a ' 

1 __ 

N 

ta ^ S 

'S tp sn «® § in <s ■# in ® | 

ft^r-i>5o4 ftj ti- <b so in «o 1 

Empirical 

probit 

CO to rH 1ft <D C5 . 

g CO p <0 g O H ip ® 

l> CO ^ l> CD 1C CO 

o 

i> 

*■» 

O © t> C> OObOCp 1 

ooji>H( ocbt>^co 1 

O ft 00 C0 o 03 CO t* H 

rH rH 

* 

Ocqooo o co co o co 

O do 05 cd ocbdscoob ^ 

O ft 00 qi O 03 00 t" IN ^ 

rH rH 


N ft IO 00 lftlA'cHpl> n 

cq rH ID CqrHrHT^CO Cq 

H H H H H H 

e 

04 l> 00 CD io o h H N 05 

^ cq cq cq cq rH oq \o co cq 

1 —1 H rl H rl H H rH i—1 

H 

t> O CO 00 05 CD 05 t> t- 

rHOCDO t>CO^rHlC 1 

Cq Cq rH rH r-H rH rH rH O 


Si 

1 


P CO 

p c© 

co 

co 

rH O 
CO CO 


CO 1> 

r—l 

1ft CO 
^ CO 
CO CO 


lo 

io 


CO 

cq 


t- CO I 05 


?*> CO Cq 
05 CO 

^ Ifjrt 

^ IA co 


cq 

ift c£> 
CO 00 
eq 05 

o ^ 

CO l> 


05 

1ft 

co 


cq 

I 


O 


O rH 
00 ^ 
OS CO 
ift 00 

id ift 


f t' O 

o 

05 O 

6 cq 

C4 GO 
00 l> 


t> 

1A 

CO 

i cq 


05 

CD 

rH 

t- 



o 

CO 

? 

1A I 

1 °? 


v> 

00 

05 

l> 1 

rH 


6 

co 

O 

CO 

i ^ 


00 



L"* 1 





ISalSi 


io ro 

04 CO 
CO <N 
rH CO 


4! C 


IH (8 


CO T* 
rH r-H 
<N 0-1 


^ CO 
IA O 
rH 00 

r- cq 

CO tH 

CD Cq 


I 


rH 

ift 

co ; 

cq 

1A 

CO 

1A 

05 

1A 1 


CO 

cq 

CO 

05 

TfH 

IO 

cq 

r-i 



CD 

GO 

CO 

rH 

<35 

ci> 

do 

I> 

05 

1> 

CO 

03 

CO 

cq 

cq 

05 

CO 

rH 

cq 

CD 

CO 


11 

O 

O 

I 


rH 1A 

1 ^ 

CD 

rH 


3A 

05 05 

o 

CO rH 

CO 

§e 

00 

CO o 

1> 

O oo 

t> 


Hi! 

cd cq 

1 1A 

05 Op 

05 


io 

ib ob 

cq 

rH IO 

1 fb 


CD 

CO 1A 

cq 

rH CO 

IA 



rH rH 

! i 

rH rH 

I 1 


1 


00 co 

N 05 rH 

H rH O 

g p-< ■* 
fi MS 
V) H O 
NN 


cq 

CD 

CO 

CO 

IA 

<x> 

rt< 

CO 

?—1 

05 

iH 

l> 

CD 

rH 

cq 

l> 

cq 

rH 

rH 

CO 

cb 

cb 

CO 1 

IA 

cb 

rH 

t-. 


rH 

CQ 






13 

-P 

O 

Eh 



D. J. Finney 


249 


The revised regression equations are therefore 

= 1-564+2-798*, 7 2 = 2-242 +2-798*, 
and the estimate of the response rate amongst controls is 

0 = 0-17 —0-0004 
= 16-96%. 

Evaluation of and Y 2 for the appropriate values of * gives a new set of expected probits 
which are very close to those in the Y column of Table 1, and the revised estimate of 0 is 
almost identical with the earlier approximation. Consequently, there is no need to undertake 
any further cycle of calculations. As a test of heterogeneity, derived from equation (28), 

SC 

Xto] = ^vv ~~ ft®%v ~ \ — Q (43) 

= 231-56-2-7982 x 80-709-0-00052x214-659 
= 5-60. 

There were 10 dose levels tested, including the controls, and four parameters, cq, a 2 , /?, C, 
have been estimated, leaving 6 degrees of freedom. Clearly there is no indication of hetero¬ 
geneity. Had y 2 been larger, the possibility that large contributions came from classes with 
very small expectations would have required consideration, and grouping of classes in the 
usual manner might then have been needed. The previous analysis (Finney, 1947 a, § 28), in 
which some classes were grouped together, gave 

- 1 - 07 - 

Since there is no heterogeneity, the matrix V in equation (42) gives the variances arid 
covariance of /? and 8C/(1 -O). In particular 

V(fi) = 0-04485, 

whence /? = 2-798+0-212, 

and 7(C) = 0-0014412 x (0-83) 2 , 

whence C = 16-96 ± 3-16 %. 

The logarithm of the relative potency of the two derris roots, or the logarithm of the ratio 
of equally effective doses, is 

M = n 2 —/q 

-(«!-«■ W 

= [vi-lit-- *a) - YZq (k- 4 )} jp . (44) 

= -0-242. . 

Since g = < 2 F(£)/A 2 

= 0-022 



250 Estimation of the parameters of tolerance distributions 

is small (note tiiat t here is the 5 % unit normal deviate, or 1-96, and has no connexion with 
the auxiliary variate), the variance of M can be found and used according to standard 
rules. The result is 


V(M) = 


1 + 


1 


/? 2 |_\Snw %Snw 


SG 


+ 2{M+x 1 - x a ) (i x - 1 2 ) Cov. |/?, ^ +(M + x x - xf)* V (/i) 


_L J_ + (Q. 1908) 2 x 0-001441 + 2 x 0-102 x 0-1908 x 0-003827 
96*9 96*3 v ’ 


+ (0*102) a x 0-044848 


h-7-8288 


= (0-010320 + 0-010384 + 0-000052 + 0-000149 + 0-000467) -*■ 7-8288 = 0-00273. (45) 

Hence M = -0-242 ±0-052. 

By taking antilogarithms of M and of M decreased or increased by 1*96 x 0*052, the state¬ 
ment may be made that W. 213 has a potency 57*3 % of that of W . 214, and that the fiducial 
limits of this estimate are 72-4 and 45-3 %. All these results are the same as in the previously 
published analysis, but the method of calculation given here is simpler and more straight¬ 
forward. Calculation of the ‘exact’ fiducial limits, as required when g is large, presents no 
special difficulties, and formulae given elsewhere (Finney, 1947a, formula (4.7)) may be 
adapted for the purpose. 

(ii) Wadley's problem 

Wadley (1.949) propounds and solves an interesting variant of the standard probit problem. 
In experiments on the control of immature stages of fruit flies, samples of fruit may be 
treated and the number of flies'which survive and develop may be counted. The total number 
of flies treated, however, can be discovered only by dissection of the fruit and counting of the 
dead flies, a laborious procedure. An alternative is to take a parallel sample of untreated 
fruit and to use the number of flies developing from this as an estimate of the numbers exposed 
to treatment in the other samples. Under certain assumptions, this problem also comes under 
the general theory of § 3, and the calculations based upon W adley’s solution can be simplified. 

If before treatment insects are distributed at random in the fruit, so that the number in 
the standard size of sample used in the tests is a sample from a Poisson distribution of 
mean N, the probability of observing s survivors in a sample subjected to a treatment which 
has a probability P of killing an insect is 

= e _* Q (W , (46) 


Now equation (9) was given in terms of the probability of r responses, but it could equally 
well have referred to the probability of s non-responses and the same theory would have 
followed. Hence equation (46) shows the general theory to be applicable to the present 
problem, with J~o, K = N. (47) 

It is not essential that an estimate of N from an untreated sample should be available, any 



D. J. Finney 


251 


more than in the problem of § 4 (i) an estimate of 0 from a control batch was essential, but 
the precision of estimation of the parameters may be much reduced if this information is 
lacking. 

From equation (46) log F = const . - NQ + s i ogNQ) 

whence, by differentiating twice with respect to the argument of F( ) and putting s = NQ, 
the expected value of F"jF is found as 


Hence, from equation (19), 


_ 1 _ 

F ~ NQ - 

W = NZ 2 jQ. 


(48) 


Tin's is equivalent to a weighting coefficient 


w = Z*IQ, 


(49) 


multiplied by N, the expected number of flies tested at a dose. The weighting coefficient is 
dependent only on the form of the tolerance distribution and may easily be tabulated. The 
auxiliary variate 


t = -QjZ 

is introduced, and equations (26) then give 


& 


8N _ ' 

jy ~ > ~ >r 


„ 8N „ _ a 

'xyi . ®xt "b jy — ®ty 


(60) 


(51) 


The contributions from a control, untreated, sample are very simple: S tt must be increased 
by N and S tv by (s a — N), where s 0 is the number of flies developing from this sample. 

The procedure for estimation of N and the parameters of the tolerance distribution then 
follows the usual plan. First a provisional value of N is guessed for the data, s 0 being a useful 
guide to this. The empirical proportion killed at any dose is taken as 

P = ( 52 ) 

from whioh empirical y- values (whether probits or as defined by some other tolerance dis¬ 
tribution) are found. A provisional regression line then leads to values of t and y, from which 
equations (61) are constructed. The solutions of these, with 


u = y-px—^-t, 


(53) 


give revised estimates of the three parameters, and iteration may proceed for as many cycles 
as seem needed. The calculations are exactly equivalent to those given by Wadley (1949), 
but they simplify the method, especially because they invoke familiar ideas of regression and 
lead to a test of heterogeneity that does not require the calculation of every expected 
frequency. 

On account of sampling variation in the actual numbers of individuals tested at different 
doses, values of s for some low doses may exceed N, and the empirical proportion surviving 
according to equation (52) is then greater than 1. The data for these doses will play their 
proper part in increasing the estimate of N if the calculations are performed exactly as 
described here, using a q greater than l or a negative p in the formation of each working 



252 Estimation of the 'parameters of tolerance distributions 

equivalent deviate. Though the method of calculation recommended in this paper has a 
close resemblance to multiple regression methods, it is in reality a method of solving the 
complicated non-linear maximum likelihood equations; the occurrence of a q greater than 
unity is an indication that the analogy is not perfect, and not a condemnation of the method. 

Once again equations (49)-(53) are perfectly general and apply to any tolerance distribu¬ 
tion of the type of (1). Chief interest, however, will attach to the normal tolerance distribu¬ 
tion. Wadley has given a table of the weighting coefficient for this distribution; Table 2 is 
an extension of this comparable to the usual tables of probit weighting coefficients. The 


Table 2. The weighting coefficient, Z 2 jQ 


Expected 

probit 

r 

ZVQ 

Expected 

probit 

Y 

-27 Q 

1-1 

0-00000004 

5-1 

0-34242 

1-2 

0-000000 09 

6-2 

0-36344 

1-3 

0-0000002 

6-3 

0-38069 

1-4 

0-000 0004 

6-4 

0-39369 

1-5 

0-000 0008 

6-5 

0-40173 

1-6 

0-000002 

6-6- 

0-40488 

1-7 

0-000 003 

■ 6-7 

0-40296 

1-8 

0-000006 

5-8 

0-39612 ' 

1-9 

0-000 01 

6-9 

0-38466 

2-0 

0-00002 

6-0 

0-36904 

2-1 

0-00004 

6-1 

0-34983 

2-2 

0-00006 

6-2 

0-32770 

2-3 

0-00011 

6-3 

0-30338 

2-4 

0-00019 

6-4 

0-27760 

2'6 

0-00031 

6-5 

0-25109 

2‘6 

0-00061 

6-6 

0-22452 

2-7 

0-00081 

-6-7 

0-19848 

2’8 

0-00128 

6-8 

0-17348 

2-9 

0-00197 

6-9 

0-14993 

3-0 

0-00298 

7-0 

0-12813 

3-1 

0-00443 

7-1 

0-10829 

3-2 

0-00647 

7-2 

0-09061 

3-3 

0-00926 

7-3 

0-07482 

34 

0-01302 

7-4 

0-06118 

3-5 

0-01798 

7-6 

0-04948 

3-6 

0-02439 

7-6 

0-03958 

3-7 

0-03281 

7-7 

0-03132 

3-8 

0-04261 

7-8 

0-02462 

3-9 

0-0S491 

7-9 

0-01899 

4-0 

0-06959 

8-0 

0-01455 

4-1 

0-08677 

8-1 

0-01103 

4-2 

0-10648 

8-2 

0-00827 

4-3 

0-12863 

8-3 

0-00614 

4-4 

0-15300 

8-4 

0-00461 

4'6 

0-17926 

8-5 

0-00327 

4-6 

0-20692 

8-6 

0-00235 

4-7 

0-23640 

8-7 

0-00167 

4-8 

0-26398 

8-8 

0-00118 

4-9 

0-29189 

8-9 . 

0-00082 

5-0 

0-31831 

9-0 

0-00057 






D. J. Finney 253 

auxiliary variate differs only in sign from that of equation (36) and may therefore be read 
from Finney’s Table II (1947 a ). 

Wadley gives a numerical example of his problem, and the same data may be used to 
illustrate the method of calculation now proposed. The ‘ dose ’ is here measured by length of 
exposure to treatment. Wadley investigated the probit regression on number of days of 
exposure; this seems to lead to some inconsistency, for the regression equation so derived 
shows a considerable (about 20 %) death-rate for day zero. The regression may be effectively 
linear between 1 and 6 days, yet depart from linearity between 0 and 1, but that would 
correspond to a peculiar form of time-tolerance distribution. As an alternative, use of the 
logarithm of the number of days as a dose metameter seems worth trying; this, in fact, appears 
to be in at least as good agreement with the data as Wadley’s supposition, and has the merit 
of internal consistency. 


Table 3. Effect of duration of treatment on development of fruit flies 


X 


V 

{N = 1034 ) 

Em¬ 

pirical 

probit 

7 

Nu i 

t 

y 

Nwx 

Nwt 

Nviy 

0-778 

4 

0-996 




- 0-393 

7-47 

73-910 

- 37-336 

709-66 

0-699 

32 

0-969 




- 0-438 

6-87 

108-345 

- 67-890 

1064-85 

0 - G 02 

55 

0-947 

6-02 

6-54 



6-61 

149-898 

- 126-243 

1645-89 

0-477 

158 

0-867 

6-02 

6-07 


- 0-632 


175-536 

- 232-576 

2215-36 

0-301 

396 

0-017 

5-30 

6-42 

409 

- 0-923 

5-29 

123-109 

- 377-507 

2163-61 

0-000 

715 

0-309 

4-50 

4-29 

131 

- 2-456 

4-52 

0-000 

- 321-605 

592-12 

Controls 

1070 

— 

— 

— 

1407 


— 

630-798 

^ 1163-156 

8391-48 


x = 0-4483, l = - 0-8267, y = 6-0641. 


SNws? 

SNwxt 

SNwfi 

SNwxy 

SNwty 

SNwy 3 

344-2602 

-377-068 

1393-38 

3995-24 

-6430-54 

50954-28 

282-8046 

-521-476 

901-57 

3762-14 

-6937-17 

50047-57 

61-4656 

144-408 

431-81 

233-10 

606-63 

906-71 


1034-00 

36-00 

1465-81 

542-63 


Table 3 shows the second cycle of calculations for these data. The first cycle had given 

N = 1034, 7 = 4-295 + 3-726* 

as approximations to the maximum likelihood estimates, x being the logarithm of the number 
of days. Consequently, for the second cycle equation (52) gave 


as the empirical death-rate. The expected probit, Y, was calculated from the regression 
equation obtained in the first cycle; in view of the large weights of the observations, two 
places of decimals were taken. Linear interpolation in Table 2 gave values of w for each 7, 
and these were multiplied by 1034 to give the total weight for each observation. Since N is 
the same for each line of the table, this multiplication could have been deferred until the end 
of the calculations; it is sometimes convenient,- however, to have records of the weights of 
each observation, and this arrangement can .take account of differences in the sizes of sample 
















254 


Estimation of the 'parameters of tolerance distributions 

used at different doses by making the multiplier of w the appropriate fraction of N. Working 
probits, y, were formed from the table of Finney & Stevens (1948), in order to avoid inter¬ 
polation for Y, but linear interpolation in other tables would be sufficiently exact. 

The remainder of Table S follows the familiar pattern. To the sum of squares of deviations 
of t was added N, and to the sum of products of deviations of t and y was added (s 0 - N), or 
36-00. Equations (61), for p and SN, then became 

SN SN 

61-4566/?+144-408^-= 233-10, 144-408/J +1465-81 = 542-63. 


The inverse matrix of the coefficients of /? and SNjN is 


whence wete derived 


/ 0-021 1735 - 0-002 0860\ 

“ \ - 0-002 0860 0-000 8877/ ’ 

P= 3-8036, ~ = -0-00455. 


By substitution of the provisional value of N 

tfj\r = -4-7, 

and the revised estimate of N is therefore 1029-3. Again 

- a- SN t 
cc~y-p X --yt 

= 4-255, 

so that the revised regression equation is 

Y = 4-256 + 3-804o;. 

The iterative process could be continued from these results, but the new column of expected 
probits is so nearly the same as that in Table 3 that further calculation is not worth while. 
The heterogeneity test (equation (28)) here takes the form 

SN 

^,= 906-71 -pS xv - w S lv (54) 

= 22-55, 

since 4 degrees of freedom remain after the estimation of three parameters. This y 2 is clearly 
highly significant, but Wadley found an equally great indication of heterogeneity in his 
analysis using ‘ days 1 instead of 1 log days ’ for x; evaluation of the expected frequencies, NQ, 
corresponding to each s shows that the large y 2 is not attributable to a large contribution 
from one class with small expectation and also that the deviations of s from N Q do not follow 
any systematic pattern. Genuine heterogeneity of the experimental material, as judged by 
the standard of the Poisson and binomial distributions used in the derivation of the statistical 
technique, seems the most likely explanation. Non-normality of the distribution of log 
tolerances would show itself as a curvature of the probit regression line, but departure from 
a Poisson distribution, in the numbers of flies per fruit—a situation very likely to obtain in 
reality—might reduce the true weights of observations and so permit erratic deviations 
from the regression line which would appear to be significant when the Poisson weights were 
used. Wadley has already suggested such heterogeneity. The theoretical consequences need 



D. J. Finney 


255 


further investigation, but the difficulty is likely to be overcome in large part by the use of 
a heterogeneity factor (Finney, 1947 a, § 18). This is 

y a /4 = 6-64, 

and is assigned 4 degrees of freedom. 

From the variance matrix, V, 

V(N) = 5-64 x 0-0008877 x 1034 2 
= 5353, 

and therefore N = 1029-3 + 73-2, 


the standard error being based on only 4 degrees of freedom. The estimation of the mean 
number of flies exposed in each test is not very reliable and does not differ significantly from 
the figure of 1070 based on the controls alone. An estimate of LD 50, the ‘dose’, or in this 
example the time, expected to kill 50 % of the flies can be derived from the value of x 
which makes 7 = 5. This is 


m = (5 — a)lfi 
= 0-196. 


( 66 ) 


After expressing m in terms of y, /?, SN, by use of equation (53), its variance can be shown to be 


> 7 W=^ 


SNw 


+ * 7(f) - 2 (m - x) i Cov. [/?, f j + (m - x)* F(/?)] , (56) 


when there is no heterogeneity, the variance and covariances being taken from the matrix 7. 
Here, the expression on the right-hand side of equation (56) must be multiplied by the 
heterogeneity factor, giving 

F(m) = 5-64 x [0-0007107 + (0-827) a x 0-000888+0-504 x 0-827 x 0-002086 

+ (0-252) 2 x 0-021174] -r- (3-8036) 2 

= 0-00138. 


Hence the standard error of m is + 0-037, Since the heterogeneity factor is based on only 
4 degrees of freedom, this standard error must be multiplied by 2-78 in order to give the 
width of the 5 % fiducial interval. The fiducial limits to m are therefore 0-299 and 0-093. 
These values correspond to an estimated LD 50 of 1-57 days, with limits at 1-99 and 1-24 days. 
The legitimacy of using the standard error may be judged from the criterion 

g = 5-64 x (2-7 8) a x 0-02117/(3-804) 2 
= 0-064. 

When g is less than 0-1, for most practical purposes the approximate method for obtaining 
the limits is sufficiently good; application of Fieller’s method for calculation of the true 
fiducial limits (Finney, 1947a, formula (4.7)) gives these as 1-94 and 1-17 days. 


(iii) Belated problems 

The adjustment of standard probit analyses in order to take account of natural mortality, 
and the maximum likelihood analysis of Wadley’s problem are instances in which the 
theorem of § 3 is exactly applicable. Other problems can be treated by very closely related 
methods. The fitting of the Parker-Rhodes equation (Finney, 1947 a, §45), for example, 
requires that equation (8) be replaced by 

r=>+M ' (57) 


Biometrika 36 





256 Estimation of the parameters of tolerance distributions 

where % is now the absolute dose, not its logarithm, and i is a third parameter requiring 
estimation. The maximum likelihood equations, which the writer has derived elsewhere, 
can be rearranged so as to make the calculations formally equivalent to those for a multiple 
regression on x and an auxiliary variate 

t = x i logx. . (58) 

Quantitative responses whose relationship to dose may be taken as proportional to an 
integral such as that in equation (3) may also be analysed by a similar procedure; the method 
follows the lines suggested elsewhere (Finney, 1947 a, §47), but again simplifies the scheme 
of calculation by the introduction of an auxiliary variate 

t = PjZ t (59) 

with the aid of which a close analogy with multiple regression is brought about. Previous 
accounts of both these problems discussed them only for the case in which (1) is a normal 
distribution, but from the present paper it is clear that the methods can be readily adapted 
for use with other distributions dependent only upon a parameter of location and a parameter 
of scale. 

5. Summary 

Existing methods for the analysis of data relating the dose level of some stimulus to the 
«proportion of individuals showing a characteristic response are in effect methods for esti¬ 
mating the parameters of an underlying tolerance distribution. Usually the distribution is 
assumed to be normal in respect of some known dose metamcter, and the probit method may 
then be used. Analogous computational procedures can be used for any other tolerance 
distribution which is completely specified by a parameter of location and a parameter of 
scale. In the present paper a general method is developed, applicable to any such distribution, 
which leads to a convenient computational routine when experimental conditions prevent 
direct observations of proportions responding. Instances of practical importance are the 
adjustment of data required when a ‘natural’ response rate is superimposed on that due to 
the stimulus, and the Wadley problem in which the numbers of individuals exposed to the 
stimulus must be estimated from a parallel sample instead of counted in the sample tested. 
When certain simple supplementary tables have been prepared, these and related problems 
can be dealt with by iterative calculations of the type now familiar in connexion with the 
standard probit method. The process beoomes formally identical with that used for multiple 
linear regression, except that both the dependent and some independent variates are modified 
at the end of each cycle of iteration, instead of the unusual patterns of computation previously 
recommended. 

REFERENCES 

Finney, D. J. (1944). The application of the probit method to toxicity test data adjusted for mor¬ 
tality in the controls. Ann. Appl. Biol. 31, 68-74. 

Finney, D. J. (1947 a). Probit Analysis: A Statistical Treatment of the Sigmoid Dose Response Curve. 
Cambridge University Press. 

Finney, D. J. (19476). The principles of biological assay. J. Roy. Statist. Soc. Suppl. 9, 46-91. 
Finney, D. J. & Stevens, W. L. (1948). A table for the calculation of working probits and weights in 
probit analysis. Biomatrika, 35, 191-201. 

Hobspall, J. G. (1945). Fungicides and their Action. Waltham, Mass., U.S.A.: Chronica Botanioa Co. 
Wadley, F. M. (1949). Dosage-mortality correlation with number treated estimated from a parallel 
sample. Ann. Appl. Biol. 36, 196-202. 



[ 257 ] 


an overlap problem arising in particle counting 

By P. ARMITAGE, B.A. 

Medical Research Council Statistical Research Unit, 

London School of Hygiene and Tropical Medicine 


Introduction 

1 , The problem with which this paper is concerned has already been briefly discussed by 
Irwin, Armitage & Davies (1949). In counts of dust particles on a sampling plate, the total 
number of particles present may be underestimated on account of the overlapping of some 
of the particles. ‘ Clumps’ of particles formed in this way cannot be analysed under the 
microscope, and will be counted as a single particle instead of two, three, or more. A similar 
situation arises in bacterial counting, where the number of organisms present on a plate is 
obtained by counting the number of colonies which develop from them ip a certain length of 
time. Two or more colonies which overlap may be counted as a single colony, in which case 
the number of colonies present will be underestimated. 

The purpose of the present investigation is to obtain a method of correcting an observed 
count of this sort, so as to allow for the possibility of undetected overlaps. A considerable 
gap between theory and practice is probably inevitable, at least in the application to the 
counting of dust particles, because of the variation in the size and shape of the particles. 
It is precisely this variation which makes it impossible for an experimenter to distinguish 
a clump from a single particle. 

The simplest mathematical model is that suggested by Irwin etal. The particles are regarded 
as circular laminae of equal diameter S, and we consider the formation of dumps when N 
particles fall at random on an area A. The concentration of the particles on the plate may be 
measured by the quantity ^ = ^ 52/44 


(i/r is the ratio of the sum of the areas of the particles to the area of the plate). We shall assume 
in the discussion of this model that for a given concentration fr, N is sufficiently large, and 
<i 2 IA sufficiently small, to make it justifiable, on the one hand, to neglect certain ‘edge 
effects ’ due to the presence of particles near the boundary of the plate, and on the other hand 
to deal merely with the expected numbers of clumps of various sizes, neglecting completely 
the sampling variation of these numbers. 

It may be possible to avoid any bias due to edge effects by adopting the convention that 
clumps overlapping one half of the boundary are included in the count, while those over¬ 
lapping the other half are not included. In any case, for a given concentration>//, the edge 
effects assume diminishing importance as N increases and S 2 jA decreases. 

In the note previously referred to, Irwin el al. have given an approximate formula fpr the 
mean clump size. Defining m = N/C, where C is the number of clumps on the plate, their 
formula is (in the present notation) 


m = 


4 fr 

1 - e-H 


( 1 ) 

( 2 ) 


= 1 + 2 ifr+0(f*). 


17-2 



258 An overlap problem, arising in particle counting 

The argument by which (1) is obtained involves the assumption that, in each clump, every 
particle overlaps every other one. This is not necessarily true for clumps of more than two 
particles, and we shall show that (1) underestimates the true value of to, and is actually valid 
only as far as the first two terms of the expansion given by (2). In §§ 2-4 of the present paper, 
an expression for m is obtained which is valid to order ij/ 2 . For small values of ijr, such as 
should occur in a well-planned particle counting experiment, (1) gives, for most practical 
purposes, a sufficiently good approximation to the true value of to for this model. 

Two other models are considered in §§ 5 and 6, in which the particles are regarded (a) as 
circular laminae of different sizes, and (b) as rectangular laminae of constant proportions 
and different sizes. Expressions analogous to (1) are obtained in each case, involving the 
mean and variance of the square root of the area of an individual particle. 

Finally, we suggest a formula for the estimation of to in terms of quantities observable in 
an actual count, which should be applicable for the types of particles usually encountered 
in practical work. 

Circular lamest ad or equal size 

2 . As a lemma, we shall need the probability density function (p.d.f.) of the distance r 
between two points placed at random inside an area A, which is large in comparison with r 2 . 
Garwood (1947) has given the p.d.f. of r, when A is a square, a circle, or a rectangle, of unit 
area in each case. For a square, for example, the p.d.f. of r is 

0(r) = 2r(n - 4 r + r 2 ) for 0 < r < 1 

and — 2r(4sin~ 1 l/r + 4^/(r 2 — I) — n~ 2) for l<r<^/2. 

For small r, we see that </j(r) = 27rr + 0(r 2 ). (3) 

We shall clearly obtain the same limiting result (3) for a unit area A of any shape, provided 
that the smallest chord of A is large in comparison with r (a provision which must he made, 
for instance, in the case of the rectangle). 

The limiting expression (3) may be obtained directly from simple considerations. For 
convenience we shall denote the magnitude of the area (not necessarily unity) by A. Given 
the position of one point, r is less than some value r 0 if the second point falls within a circle 
of radius r 0 , with centre at the first point. Neglecting edge effects, the probability of this 
event is 

P(r<r 0 ) = zrg/A. 

Any complications due to the boundary will affect a proportionate area of order r 0 /A 4 
(provided that r (l is small in comparison with the smallest chord of A). Hence 

P(r<r 0 ) = (7rryA){l + O(rim, (4) 

as rg/A -> 0. 

On differentiating (4) with respect to r 0 , we have 

<f>(r) = (2nrjA){l + 0(^IA)i}, (5) 

which will be seen to be equivalent to (3). 

3. Any two particles (which we assume to he circular laminae of diameter S) will overlap 
if the distance between their centres is less than 8. Let X and Y be the centres of two over¬ 
lapping particles, and denote by 8 and T the circles with centres X and Y respectively, 
and radius 8, 



P. Armitage 259 

A third particle will form a triplet with the given particles if it overlaps either or both 
of them. Two types of triplets may be defined: 

(a) A ‘chain’ triplet, in which the third particle overlaps only one of the given particles . 1 
This will occur if the third particle centre falls in one of the two non-overlapping parts of 
S or T. 

ip) A ‘complete’ triplet, in which the third particle overlaps both the others. This will 
occur if the third particle centre falls in the area common to 8 and T. 

The area of overlap of 8 and T is 

2 <S 2 cos- 1 (r/ 2d) -r<](8 2 - ? -2 / 4 ) ■ 

The probability that a third particle, falling at random on A, forms a chain triplet with the 
two given ones is therefore 

P 1 (<5, r) = (2lA){7TP-28*cos- 1 (rl28) + rJ{82-r*li)} > (6) 

and the probability that it forms a complete triplet is 

P 2 (8,r) = (l/A){2d 2 cOs- 1 (r/2d)-rV(5 2 -r 2 /4)}. (7) 

If N particles fall at random on A , the probability that any two particles, chosen at random 
out of the N, form part of a chain triplet is (asymptotically, for sufficiently large N) 

K 1 (8)~Njj>(r)P 1 (8,r)dr, ( 8 ) 

where <j>(r) is given by (5); and the probability that they form part of a complete triplet is 
(again for sufficiently large N ) 

A 2 (d)~A[V(r)P#,r)dr. (9) 


Any inaccuracies in (6) and (7) due to edge effects will be accounted for by the asymptotic 
nature of (8) and (9). From (8), (6) and (6), for a given value of = irNS'-jAA, and sufficiently 
small values of d a /A, 

K z (i) = (4N/A)j S (7rr/A) (1 + 0(r a /A)i}{vd 2 -2d 2 cos- 1 (r/28) + r J(8 2 -r*/4)} dr. 

Using the results 


and 

we have 


Jcc cos -1 xdx = J(2s; 2 -1) cos -1 x-\x^{l — x 2 ) + constant, 

J* 2 1 — x 2 ) dx = -^{sin -1 x — x*J{l — x 2 ) + 2x 3 */( 1 — a; 2 )} + constant, 


Ai ( 8 )= (&N'nlA 2 )^rr8 i l2 — 88 i j xaos^xdx + ^j * 2 .^(l-a; 2 )cfoj{l + 0($ 2 /A)*} 
Similarly, from (9), (5) and (7), we find that 


( 10 ) 


(ii) 



260 An overlap problem arising in particle counting 

New, suppose that in a count of N particles we find, on the average, P 1 isolated particles, 
1\ doublets, P 31 chain triplets, and P 32 complete triplets. We may evaluate P x easily, for the 
probability that a particle is isolated is the probability that no other particle centre falls 
within the circle of radius 8 with centre coinciding with the given particle centre. For a 
given jr, and values of 8 2 jA sufficiently small, this probability is 

exp (- nNS^IA) = exp (- 4 ft). 

Nor large N, therefore, 

P 1 ~Wexp(-4^) = W{l-4^+8f 2 +0(^ 3 )}. (12) 

There are N C 2 pairs of particles, and the proportion of this number which overlap hut 
which do not form parts of triplets is 


for large N. Hence 



JV)dr-N#)-N#) 


tt 8 2 TT(iTT+^^)m i 
~ A 4 4 2 ’ 


N*{7rS> ff (4g + 3V3 ) Nt* _ j for+ 3^3) 
2(4 4 A 2 y 7 t v 


(13) 


In order to evaluate -^81 and P 32 , we remark that out of the jV C 2 pairs of particles, a pro¬ 
portion K^S) will overlap and form parts of chain triplets, and a proportion K 2 {8) will overlap 
and form parts of complete triplets. Since each chain triplet contains two overlapping pairs, 
and each complete triplet three overlapping pairs, it follows that 


P Z1 = »CM8)I2 and P 38 = 3. 

From (10) and (11), for large N, 

Si£W a 

» 8 . 1 * 7J l / • 


and 


n(in-3^)NW 2(4tt-3,/3) 

82 24 ~A^' §7r ' 


(14) 

(16) 


From (14) and (15), the total number of triplets, P 3 , is given by 

■ P s = P 31 + P S z = ^~^Nr- ( 16 ) 

We have in (12), (13) and (16) obtained asymptotic expressions for P v P 2 and P 3 , valid 
for large N. (We shall henceforth use these formulae without necessarily indicating their 
asymptotic nature.) No account, however, has yet been taken of clumps of higher order than 
triplets. Since these large clumps each contain at least two triplets, and since a triplet will 
form part of a clump of higher order only if at least one other particle centre falls sufficiently 
near the triplet (which event is easily seen to have a probability of order ijr), it follows that 
the expected numbers of clumps larger than triplets are of order P 3 ijr, that is, of order Nf 3 . 

The number of isolated triplets is therefore, from (16), 

p’ s = p s +N.o(r) = N ^-^^ r+oir)}- ,( 17 ) 

It is readily verified from (12), (13) and (17) that 


P 1 + 2P a + 3P'=W{l + 0(^)}, 


( 18 ) 



P. Armitage 


261 


ag we should expect. The total number of clumps, 0, is given by 

0 = P t + P a + P' 3 + N. 0(f3) = jyjl - 2i]r + - (47r ^ V 2 + C# 3 )) ■ (19) 

An interesting point is that, if we were concerned merely to obtain the expression (19) for 0, 
we need not evaluate K x {8). For, assuming on a priori grounds that (18) is true, we have, 
to order ijf 2 , 

G= (A-2P a -3P 3 ) + P 2 + P 3 
= W-P 2 -2P 3 

= N- N G Z (J fl V )dr - K x (8 )-*,(*)} - ^ N G,{K x m + *,(*)/3} 

= JV ■- 0 V) - *i(3)/sl > (20) 


a formula not involving K x (8) explicitly. It may easily be verified that (20) does in fact give 
the correct result (19). 

4 , We are now able to compare various formulae which are available as approximations 
to to = NjG. 

For a finite value of i[r, provided N is sufficiently large and S 2 jA sufficiently small, we have, 
from (19), 


1 

m 


= 1-2 f 


2(4tt —3^3) 


3tt 


ip' 2 +0(ij/ 3 ) 


( 21 ) 


= 1 - 2l/r + l-564i/r 2 + 0(i/r 3 ). (21 a) 

The equation (1) which is proposed by Irwin et al. gives, upon expansion, 

ljm=l-2f + 8f 2 /3 + 0(f- 5 ), (22) 


which agrees with (21) to the first degree in ijr. A comparison of the coefficients of ijr 2 in (21) 
and (22) shows that, at least for sufficiently small ijr, (21) gives the higher value of to. This 
result was to be expected, since the proof suggested by Irwin et al. takes no account of chain 
overlaps. 

It may seem preferable to obtain an expression for m by the following expansion derived 


from (21): 


m =1 — 2^4- 


2(4tt-3V3) 

3tt 


T/5 r2 + 0{tfr a ) 


= 1 + 2 ^ + 2[2n + - 3 ^ V + 0{f 3 ) 

67T 

= l + 2^+2-436^ 2 +0(^ 3 ), 


(23) 

(23a) 


However, (21) may be seen to give a higher value of m than (23), and since (21) undoubtedly 
underestimates the true value owing to our neglect of overlaps of higher order than triplets, 
it is presumably safer to use (21) than (23). 

In Table 1 are shown the results of some sampling experiments which were performed by 
Mr C. N. Davies and the author as a verification of (1), before the present results were 
obtained. Each experimenter placed 200 points randomly on a 100 x 100 lattice, by means 
of random sampling numbers. Circles of various sizes (giving different values of ijr) and centred 



262 


An overlap problem arising in particle counting 

upon these random points were then drawn, and the clumps of various sizes were counted. 
The corresponding values of to = NjG are shown in the third column of Table 1. The standard 
errors which are also given are estimated by 

s.e.(to) = sj^jC, 

where s is the estimated standard deviation (with divisor (G— l) 1 ) of the distribution of 
clump size. 

The theoretical values of m obtained from (1), and from (21) (neglecting in the latter 
formula terms of order ijr*), are shown in the fourth and fifth columns of Table 1 respectively. 
It will be observed that the only experimental value of to which appears to differ significantly 
from either of the theoretical values is that for experiment (6) with \jr = 0-016. Nevertheless, 
it may not in fact do so, because for low values of i]r the estimated standard error will be fairly 
highly correlated with the observed value of to. If, therefore, the value m = 1 • 006 is smaller 
than the true value, its standard error will also be underestimated. 


Table 1. Comparison of experimental and theoretical values of m, for different values of \fr 


, nNS 2 

Experimental value of m 

Theoretical values of to 

V ' 4 A 

and standard error 

From (1) 

From (21) 

0-016 

(a) 1-026 ±0-016 

(b) 1-006 + 0-006 

1-032 

1-032 

0-024 

1-06 ±0-02 

1-06 

1-05 

- 1 3 >1«[■ ilpjp 

1-08 ±0-04 

1-07 

1-07 


(a) 1-15 ±0-03 

(b) 1-14 ±0-03 

1-13 

1-14 


1-28 +0-06 

1-21 

1-22 

ftifiiJlil 

1-38 +0-07 

1-31 

1-34 


1-47 +0-08 

1-43 

1-49 

0-261 

1-71 +0-12 

1-69 

1-68 


The values of fr occurring in dust-particle counting should be considerably less than 0-2. 
The last two columns of Table 1 show that the difference between (1) and (21) is, for most 
practical purposes, immaterial. 


CIRCULAR LAMINAE OF UNEQUAL SIZE 

6 . Suppose that the number of circular particles whose diameter lies between S and 
8 + dS is Nf(S) dS (0 < $< oo). 

The probability that a given particle S of diameter d is not overlapped by any particle 
of diameter between S' and S' + dS 1 is 

exp{-(7r/44) (* + *')* Nf(8')dS'}. 

The probability that S is not overlapped by any other particle is therefore 

exp {- JV/44) {8 + 8')*Nf{8') d<S'j = exp{ - {nNj4A) (S 2 +2 Sv + v' a )}, 
where v is the mean, and v' { the ith moment about zero, of the distribution of S. 







P. Armitage 


263 


In the notation of § 3, 


Pi = J 0)exp {- (nNliA) (S 2 + 28v + r')} d8 

= jym {! - {nNliA) (<S 2 + 2 to + v') +...} dS 

= N{1 — {vN)2A) (v 2 + v' 2 ) + P} 

= N{ 1 - (nNj2A ) (v a + 2r 2 ) + It}, 


where v 2 is the variance of the distribution, of 8, and R is of order (Nv 2 jA) 2 , if we assume that 
pi is of order ^' ^+, (a condition likely to be fulfilled by any distributions arising in practice). 
As a first approximation to m, correct to the first order in (Nv 2 jA), we may write 


G = P!+P a 
= P 1 + (¥-P 1 )/2 
= N{ 1 — (ttN /4A) (r a + 2v 2 )} 
and' m == l + (7rW/4A)(r g + 2r 2 ). ■ 


(24) 


We may note that the value of m given by (24) is midway between the two values obtained 
by substituting for 8 2 in (2), v 2 and v' 2 respectively. 

The problem of evaluating P 3 , in order to obtain an expression for m comparable with (21), 
has proved difficult, but we may note that, in the same way that (1) proves a better approxi¬ 
mation than (2) in the case of equal circles (since the coefficient of >// 2 in the expansion of (1) 
is positive), so we may expect the analogous formula in the case of unequal circles to give 
a better approximation than (24). 

By an argument similar to that of Irwin et al., we obtain the approximation 


m = 




(26) 


where = ttN(v 2 + 2v 2 )/8A. (26) 

It is convenient to replace v and v 2 by the moments fi and /i 2 of Ai ja , the square root of the 
area of the circular particles. Since 



we have 


_/P n 
v , v 2 “ 4’ 


and (26) reduces to ijr — N{/i 2 + 2/P )j 2 A . 

In the case of equal circles, putting [i % = 0 and jj? = 7 t 5 2 /4, (27) reduces to 


(27) 


f = t tN8 s !4A., 

in agreement with the original definition of ijr, as we should expect since (1) and (25) are the 
same formula. 



264 


An overlap problem arising in particle counting 

Rectangular laminae oe unequal size 

6. We shall assume that all the rectangles are of the same proportions, with sides 21 and 
2M in length, k being a constant. Suppose that the p.d.f. of l is g(l) (0 < l < oo). 

A given rectangle L with parameter l will be overlapped by a rectangle L' with parameter 
V , falling at an inclination [i to L, if the centre of L' falls within an area Q, whose boundary is 
shown as the outer thick line in Mg. 1. The inner thick line in Fig. 1 is the boundary of L, 
and the dotted lines show the limiting positions of L 



Fig. 1. Diagram showing the admissible area for the centre of a 
rectangle L' overlapping a rectangle L. 


The admissible area Q is 

4 IV {{ l + 7b 2 )sin/? + 2/c cos /?} +4 k(l* + Z' 2 ) . 

Assuming that all values of /? are equally likely, the probability that L is not overlapped by 
any other rectangle is therefore 

exp j - ^ "^*(4 NIA )g(F) [{(1 + k 2 ) sin /?+ 2k cos /?} IV + k(l* + Z' 2 )] dl' d §j 

= exp J - j”(4N/irA)g(V) [2(1 + h) 2 W+irk(P + V 2 )]dV j 
= exp { — [iNjirA) [2(1 + 7c) 2 lv + 7r/c(Z 2 + r 2 )]}, 

where v is the mean, and v[ is the ith moment about zero of the distribution of l. Hence 
Pi = Ng(l)exp{-(4:NjnA)[2[l + k) 2 lv + 7Tk{l 2 + v l i )]}dl 

= N{1 - (iNfrA ) [2(1 + lcf v 2 + 2 whQ + B} 

= N{l-[&Nj-nA) [{(l + k) 2j r7rk}v s +n]cvA+B}, 

where e 2 is the variance of the distribution of l, and B is of order (Nv 2 /A)*, assuming that 
v{ is of order v iVl . 

As in § 6, we obtain, as a first approximation to to, 

to = l + (4W/m4)[{(l + &) 2 + u&}n 2 + u7b’ 2 ], 




P. Armitaqe 


265 


and as a better approximation 


m = - 


4 ft 
l^er* 


where ft ~ {ZNjnA) [{(1 + kf + 7Tk}v 2 +Trlcv i ]. 

Replacing v and v. 2 by the moments /i and fi. z of fta = 2lftk, we have 

ft = (^/24)[ /i2+ {l+ii^l- 2 j^]. (28) 

We may now compare (27) and (28), since in both formulae ft is expressed in terms of the 
moments of the distribution of the square root of the area of the individual particles. 

For square particles, putting k = 1 in (28), we have 

ft = W(/t a +2-273^ 2 )/24, (29) 

a result, as we should expect, not very different from (27). 

As k->co or /t->0 in (28), ft, and therefore m, increases indefinitely. This, also, was to be 
expected, since needle-shaped particles having areas comparable to those of circular particles, 
must be of infinite length, and each particle may therefore be expected to cross every other 
one. 

For k = £ or k = 5, we have, from (28), 

ft = Nfa + 3-292/i a )/2H. (30) 

Even for such long rectangles the difference between (27) and (30) is not very serious. 


Practical applications op theory 

7. We have, in §§ 5 and 6, proposed that the mean clump size should be estimated by the 
approximation 


m — 


Aft 


1 - e -4 ^ ’ 


(25) 


and have obtained, in (27) and (28), expressions for ft appropriate to the cases where the 
particles are unequal circles, or unequal (but proportionate) rectangles. 

The expression for ft appropriate to any particular practical situation will probably be 
intermediate between (27) and (30) (the latter formula corresponding to the case of rectangles 
with k = -g or k = 5, which are much more needle-like than the shapes which are likely to 
arise in practice). The experimental worker should have some idea of the shapes which are 
likely to occur. In the absence of such knowledge, perhaps a suitable expression for ft 

WOuldbe ft = Nftt 2 + 2-^)/2A = NKj2A, say. (31) 

Using (31), then, and assuming that (i and /< 2 are known, and that C and A are observed 
from the count, we may write N = mO, and, from (25), 

2 KmC/A 


m = 


Hence 


and 


1 _ g-aRmO/a ‘ 

! — e -aKmOM = 2 KOjA 


m — 


A 


2K0 loge (1 — 2/f GlA] ’ 
which may be used as an equation for the estimation of m. 


(32) 



266 


An overlap problem arising in particle counting 

We have assumed, in obtaining (32), that fi and /,i 2 are known, or at least may be estimated 
with sufficient accuracy. We may, for instance, be able to use results obtained from a previous 
count on the same type of particles, in which the concentration was so low that overlapping 
did not occur. If we have to estimate fi and ft % from the count itself, as the mean and variance 
of the square root of the area of a clump, will be overestimated by a factor 

1 + 0(i/r). 

Since (25) is in any case valid only to 0(i/r), we may disregard this source of error. 

This problem was suggested by Mr 0. N. Davies of the Medical Research Council Group 
for Research in Industrial Physiology, and I am indebted to Mr Davies, to my colleague 
Dr J. 0. Irwin, and to Dr H. 0. Lancaster of Sydney for some helpful discussions on the 
mathematical treatment. I should also like to thank Mrs M. G. Young for preparing the 
diagram. 

REFERENCES 

Garwood, E. (1947). The variance of the overlap of geometrical figures with reference to a bombing 
problem. Biometrika, 34, 1. 

lawns', J. 0., Armitage, P. & Davies, C. N. (1949). The overlapping of dust particles on a sampling 
plate. Nature,, Bond., 163, 809. 



[ 267 ] 


TABLES OE AUTOREGRESSIVE SERIES 

By M. G. KENDALL 

In my Contributions to the Study of Oscillatory Time-Series (1946) I gave in full four series 
which had been calculated from the formula 

u M +au l+1 +bu t = e M , (1) 

with certain values of a and b and a rectangular random element e. These series have re¬ 
peatedly been used by other workers to exemplify or to verify theoretical results; see, for 
example, Yule (1945), Bartlett (1946), Quenouille (1947) and Orcutt (1948). It may, therefore, 
be useful to publish further series subsequently prepared for some studies which are not yet 
ready to appear. 

I preserve the numbering of the four series already published and of some additional series 
referred to, but not given, in a later paper (Kendall, 1949). The full set of series, including 
those now published, is as follows: 


No. of 
series 

No. of 
terms 

Nature of 
stochastic element 

Generating equation 

i 

480 

Rectangular 

w <+2 - 1 ■ l« (+ i + O' 6n, = e, + , 

2 

240 

>1 

u t+ a~ l , 2u (+ - l + 0-4-tt, = e (+ , 

3 

240 

>> 

^ +2 ~ l'lw i+1 += e* +a 

4 

240 

>1 

w (+ 2+ 1-0«, +1 +0-6 m,= e (+ , 

6a 

400 


As series 1 

66 

400 



5c 

400 



5d 

400 

}) 


6 (in toto) 

1600 


,, 

6 a 

400 


As series 3 

66 

400 



6c 

400 



6 d 

400 



6 (in toto) 

1600 

>> 

>1 

7 

600 

Normal 

u t+i ~ O’9 u t = e i+1 

8 

300 

Series 7 

As series 1 

9 

600 

Normal 

u, +1 - 0-1u t = e t+1 

10 

600 

Series 9 

As series 1 

11 

600 

Normal 

w (+1 —0’6it ( = e j+l 

12 

600 

Series 11 

As series 1 

13 

600 

Normal 

u t+i — = e t+1 

14 

600 

Series 13 

As series 1 

16 

600 

Normal 

u m -0-lu, = e t+1 

18 

600 

Series 15 

As series 1 


The stochastic element for series 1-6 inclusive was obtained by taking two-figure numbers 
from the Tables of Random Sampling Numbers by Babington Smith and myself, ignoring 
00 and reducing to zero mean by subtracting 50. The ‘ normal ’ element for the odd-numbered 
series 7-15 was obtained from the tables by Mahalanobis and others (1934), which are to 
three decimal places and were derived by converting to normal deviates the Tables of 
Random Sampling Numbers by L. H. 0. Tippett. (When the work was done H. Wold’s (1948) 








268 Tables of autoregressive series 

conversion of tlie Kendall-Babington Smith numbers was not available.) The values of 
these series were themselves used as the stochastic elements in constructing the even- 
numbered series 8-16, so that these latter have the same structural constants as series 1, 
but are based on a stochastic element which is itself autoeorrelated in a Markoff chain. 

If 7} obeys the relation 


+ e <+i> 


( 2 ) 


then u t+i + au l+l + bu t = rj M 

is equivalent to w (+3 + (a + lc) m (+2 +(6 + ah) u t+1 + bhu t = e (+3 , (3) 


that is to say, to a third-order autoregressive series with a random stochastic element. 
Evidently more complicated series may be built up, if desired, from those here given. Con¬ 
versely, any linear autoregressive series can be regarded as built up from those of the first 
or second order by an iterative process in which the stochastic element at each stage is the 
value of the series generated by the previous stage. 

In addition to the series themselves I give, in Tables 19-22, the serial covariances (or 
rather, the serial first-product-moments-about-zero) and the serial correlations for all series 
except the odd-numbered series from 7 to 15, which have not been calculated. A few points 
require explanation: 

(а) 1 Starting up ’ the series. Series 1-4, the subseries of series 5 and 6, and the odd-numbered 
series 7-15 were started up by talcing and % to be random numbers of the same character 
as the stochastic element and beginning the series with %. The subseries of series 5 and 6 
were started up separately. For the even-numbered series 8-16 and u 0 were taken to be 
the first two terms of the corresponding odd-numbered series, two extra terms of the latter, 
numbered 501 and 502, having been computed so as to give 500 terms of the derived series. 

(б) Product-moments. The theoretical mean of all the series is zero (and the actual means 
for series of these lengths are very close to zero); and there are both practical and theoretical 
reasons for calculating product-moments about zero rather than the actual means. I denote 
the first-product-sum-about-zero T,u i u l+lc by c k and give the values of c k for various ranges 

of k in Tables 19-22. 

For series 1-4 and 7-16 the sum c k is taken simply over the values of the series, i.e. is the 
sum of n—k terms beginning with u 1 u 1+k and ending with u n _ Jc u n . To simplify the arithmetic 
for the subseries 5a-6d the sum is taken over 400 terms, the extra k being obtained by a 
circular definition; that is, c k is the sum of terms beginning with u x u 1+k , proceeding through 
u m-k u m^° u m-k u i> M 502 -* w 2 > etc., and ending with u 500 u lc . For the full series 5 and 6, c k was 
taken as the sum of the four corresponding values for the constituent series. 


n 

(c) Serial correlations. As an estimate of the variance, I used c 0 = S u t- The estimate of 
the fcth serial correlation r k is then 


n 

c 0 n—k’ 




but where, as in series 5 and 6 and the subseries thereof, the summation of c k is over n terms, 
the factor n/(n — h) is replaced by unity. 

The estimated serials for series 1-4 published in my brochure were based on the formula 


r co v{ui,u t+k ) 

k V{ vartt i var u,+kY 


( 5 ) 



I G. Kendall 


and lienee may differ slightly from the values of Tables 19 and 20, The differences are negli¬ 
gible for most purposes, at least for series of these lengths, but for short series (4) is better 
than (6), which may introduce a substantial Mas, 

I am very grateful to the National Institute of Social and Economic Research for allowing 
me to call on the help of Miss Joan Ayling in the heavy computing involved in the prepara¬ 
tion of these tables, Miss Ayling did most of the arithmetical work in constructing series 6~16 
and computing the serial coefficients, and it is a pleasure to record my indebtedness to her. 


REFERENCES 

Babott, M. S, (1946), J.B, Stalk Soo, Suppl, 8,27, 

Kendaix, M, G, (1949), Proceedings oj the International Statistical Institute at Washington, 1947 
(in the Press), 

MmiANQBis, P. C, and others (1934). Sanlchp, 1, 289, 

Oroutt, Ct, H, (1948), J.B, Statist , Soo, Series B, 10,1, 

Qubnouille, M, H, (1947), J,B, Statist, Soo, 110, 123, 

Wold, H, (1948), Tracts for Computers, No, Z1F, Cambridge University Press, 

Yule, G, U, (1946), JAStatk Soo, 108, 208, 



270 Tables of autoregressive series 


Table 1. Series 5a- 


No . of 
term 

0 + 

50 + 

100 + 

160 + 

200 + 

260 + 

300 + 

350 + 

1 

57 

10 

9 

36 

-27 

-67 

-46 

-66 

2 

69 

26 

6 

22 

-53 

-50 

-59 

-31 

3 

66 

16 

34 

64 

-67 

24 

-64 

-22 

4 

80 

2 

64 

53 

-86 

32 

-88 

6 

6 

63 

26 

18 

14 

-83 

-14 

-95 

67 

6 

-6 

60 

-31 

-32 

-67 

-12 

-29 

109 

7 

-65 

49 

-77 

-29 

-43 

-23 

-17 

82 

8 

-72 

3 

-66 

3 

-3 

-63 

-9 

57 

9 

-71 

'21 

-73 

65 

66 

-38 

-40 

-1 

10 

-86 

-6 

-69 

70 

30 

7 

-67 

-44 

11 

-08 

-6 

1 

38 

22 

61 

-8 

-29 

12 

-61 

32 

76 

56 

10 

75 

69 

-26 

13 

-48 

29 

91 

66 

-16 

47 

37 

-36 

14 

-69 

63 

74 

64 

6 

6 

46 

12 

16 

-81 

58 

36 

36 

17 

-69 

9 

60 

16 

-99 

26 

-11 

31 

69 

-74 

14 

66 

17 

-31 

34 

-24 

57 

27 

— 83 

-6 

40 

18 

5 

-19 

-1 

60 

-9 

-84 

-34 

-13 

19 

66 

-69 

-15 

11 

-32 

-2 

-60 

-83 

20 

51 

-91 

-59 

22 

-74 

-9 

-68 

-68 

21 

66 

-66 

-17 

48 

-71 

22 

-80 

-19 

22 

23 

-9 

-11 

79 

-64 

-20 

-92 

-5 

23 

10 

60 

-44 

107 

-64 

-66 

-89 

-12 

24 

44 

22 

-90 

94 

-41 

-14 

-78 

-48 

26 

82 

-16 

-47 

16 

20 

29 

-66 

-32 

26 

* 96 

-27 

-15 

10 

66 

81 

-68 

-16 

27 

69 

20 

27 

33 

101 

60 

-12 

-48 

28 

26 

40 

-1 

48 

124 

-18 

8 

-79 

29 

23 

32 

-68 

11 

62 

-50 

-33 

-97 

30 

57 

14 

-61 

-25 

-26 

-54 

-48 

-56 

31 

76 

26 

-29 

-49 

-37 

-65 

-37 

-52 

32 

76 

-13 

-43 

-5 

-55 

-53 

-31 

13 

33 

33 

16 

-14 

31 

-22 

-46 

0 

3 

34 

30 

-2 

-7 

35 

-2 

-19 

14 

-33 

36 

-20 

36 

28 

8 

17 

-30 

2 

3 

36 

-34 

89 

27 

-40 

16 

13 

44 

60 

37 

-06 

126 

-28 

-74 

-19 

47 

22 

42 

38 

-21 

76 

-11 

-35 

4 

74 

-21 

.50 

39 

39 

68 

-2 

-47 

60 

19 

-81 

18 

40 

31 

44 

-28 

-43 

38 

-31 

-108 

-48 

41 

66 

: 21 

-1 

-70 

-36 

-44 

-92 

-29 

42 

42 

-21 

32 

-65 

-96 

-60 

-38 

-43 

43 

21 

-66 

61 

-49 

-72 

-60 

34 

-68 

44 

-6 

-36 

42 

20 

-50 

-87 

43 

-55 

46 

-6 

26 

19 

94 

-3 

-83 

18 

12 

46 

-38 

32 

9 

92 

48 

-71 

-23 

52 

47 

-4 

13 

29 

61 

46 

12 

-68 

10 

48 

-32 

-4 

6 

41 

-18 

70 

-97 

-55 

49 

-19 

21 

-48 

65 

-72 

25 . 

-90 

-37 

60 

-24 

13 

-24 

14 

-90 

-25 

-68 

-69 


Thus , the 242 nd term lies in the column headed ‘200 + ’ and the row 42 , i . e . is — 95 . 




M. G. Kendall 


271 


Table 2. Series 5b 


No . of 
term 

0 -f 

50 + 

100 + 

150 + 

200 + 

250 + 

300 + 

350 + 

1 

-34 

-45 

-28 

111 

-16 

-29 

-23 

16 

2 

10 

-8 

-34 

37 

34 

-30 

-24 

74 

3 

67 

-15 

-11 

30 

1 

-61 

6 

109 

4 

43 

-48 

19 

-9 

-13 

-36 

-25 

35 

5 

26 

-2 

49 

-74 

-11 

-48 

1 

-39 

6 

33 

-10 

61 

-61 

-28 

-46 

-35 

-86 

7 

69 

-31 

-7 

7 

11 

-31 

-25 

-120 

8 

78 

-11 

-6 

20 

22 

-40 

-44 

-80 

9 

96 

29 

18 

11 

68 

-71 

-2 

-17 

10 

49 

36 

8 

23 

51 

-50 

-25 

-18 

11 

47 

28 

-48 

-26 

-3 

-54 

-64 

16 

12 

43 

17 

-90 

-29 

-31 

-33 

-47 

26 

13 

12 

-11 

-63 

-23 

-47 

-28 

24 

28 

14 

-37 

-62 

-46 

-10 

-71 

-58 

67 

60 

15 

-73 

-75 

13 

1 

-67 

-19 

61 

4 

16 

-45 

-16 

67 

-16 

-49 

51 

-11 

-52 

17 

-40 

10 

95 

-32 

-16 

81 

-11 

-86 

18 

-30 

13 

43 

7 

9 

107 

-21 

-89 

19 

-20 

13 

-33 

67 

-6 

116 

24 

-101 

20 

-23 

23 

-63 

67 

-49 

113 

4 

81 

21 

-27 

60 

-13 

19 

-71 

40 

22 

6 

22 

-51 

33 

-9 

-36 

-103 

2 

63 

88 

23 

7 

-10 

22 

-23 

-65 

-53 

82 

97 

24 

77 

-48 

32 


12 

-81 

76 

19 

25 

56 

-96 

39 

-67 

24 

-75 

62 

-32 

26 

72 

-92 

9 

-35 

20 

-11 

-7 

-13 

27 

80 

-48 

-10 

-15 

10 

-4 

— 44 

-28 

28 

72 

15 

1 

48 

-20 

6 

-23 

-24 


75 

81 

26 

88 : 

-1 

28 

-43 

-34 

30 

15 

119 

-7 ■ 

117 

2 

67 

-2 

1 

31 

-17 

134 

-69 

113 

-13 

26 

57 

-6 

32 

-27 

67 

-66 

96 

-4 

-48 

88 

15 

33 

-3 

26 

-13 


-31 

-33 

89 

16 

34 

46 

-54 

56 

34 

-28 

18 

63 

-23 

35 

91 

-89 

104 

4 

-20 

-6 

11 

-69 

36 

125 

-23 

95 

26 

-43 

-20 

-49 

-68 

37 

81 

-18 

55 

6 

16 

-32 

-29 

-10 

38 

16 

18 

59 

-6 

73 

-69 

-29 

6 


-6 

78 

26 

31 

105 

-77 

16 

6 


24 

62 

-22 

21 

37 

-18 

42 

-36 

41 

64 

19 

13 

-34 

-6 

28 

46 

-16 

42 

39 

-13 

10 

-77 

-37 

76 

18 

-8 

43 

-23 

-28 

-23 


-20 

119 

-13 

2 

44 

-65 

-6 

-37 


24 

62 

13 

-1 

45 

-38 

45 

-59 

1 ' 32 

1 

1 

-9 

-2 

46 

-56 

70 

-79 

77 

-20 ' 

-6 

13 

17 

47 

-87 

11 

-43 

32 

-61 

24 

42 

-26 

48 

-47 

12 

27 

-42 

-33 

-9 

1 

-79 

49 

-47 

40 

92 

-59 

19 

-39 

-44 

-86 

m 

-29 

/- 

22 

110 

-41 

-4 

-81 

-24 

-36 


•Biometrika 36 


18 
















M. G. Kendall 


273 


Table 4. Series 5 d 


No . of 
term 

0 + 

50 + 

100 + 

150 + 

200 + 

250 + 

300 + 

350 + 

1 

8 

68 

-52 

63 

18 

-12 

69 

-9 

2 

24 

117 

-79 

39 

11 

-28 

22 

27 

3 

21 

81 

-49 

23 

18 

-64 

-62 

-6 

4 

38 

-17 

-2 

-8 

41 

-42 

-52 

-61 

5 

45 

-72 

60 

-33 

77 

-24 

17 

-62 

6 

-9 

-29 

49 

11 

61 

18 

2 

20 

7 

-42 

-42 

22 

47 

29 

73 

-26 

66 

8 

-5 

-55 

18 

9 

16 

67 

-11 

64 

9 

-25 

8 

51 

-60 

-21 

38 

-3 

6 

10 

21 

68 

20 

-96 

9 

-18 

-6 

29 

11 

36 

59 

-12 

-36 

17 

-57 

-23 

11 

12 

0 

5 

-10 

30 

-19 

-27 

6 

-11 

13 

-47 

23 

- 30 

76 

-56 

-11 

18 

17 

14 

-52 

28 

-62 

38 

-43 

44 

-19 

6 

16 

-49 

10 

-71 

10 

-45 

87 

-11 

18 

16 

6 

39 

-90 

-62 

3 

121 

44 

-12 

17 

54 

60 

-63 

- Ill 

29 

79 

54 

-26 

18 

74 

14 

12 

-97 

37 

24 

72 

-8 

19 

56 

19 

45 

-88 

0 

-16 

99 

-15 

20 

63 

-26 

78 

-32 

1 

-9 

48 

1 

21 

-3 

-4 

33 

39 

8 

-40 

39 

-11 

22 

-42 

-32 

42 

33 

-23 

-52 

-18 

35 

23 

-75 

. 2 

24 

-32 

12 

-52 

-30 

70 

24 

-89 

51 

-25 

-56 

11 

-64 

20 

77 

25 

-86 

23 

-27 

-28 

17 ■ 

-60 

86 

22 

26 

-27 

32 

19 

-28 

-3 

-78 

116 

-9 

27 

16 

67 

22 

-51 

-16 

-30 

109 

-08 

28 

56 

34 

27 

-77 

-10 

0 

19 

-32 

29 

77 

-11 

7 

-19 

-9 

56 

-9 

23 

30 

19 

-45 

-16 

16 

-13 

64 

-8 

4 

31 

-43 

-34 

-13 

-11 

-36 

-1 

-18 

-24 

32 

-22 

-60 

-39 

-24 

-39 

-46 

4 

14 

33 

28 

-63 

9 

22 

-42 

-79 

-7 

64 

34 

86 

3 

65 

20 

-24 

-102 

3 

59 

35 

117 

1 

114 

29 

-20 

-84 

-39 

-9 

36 

108 

9 

68 

53 

31 

-29 

-64 

-36 

37 

103 

50 

48 

67 

17 

57 

— 5 

-60 

38 

53 

7 

4 

47 

-39 

94 

33 

-58 

39 

-3 

-23 

12 

-4 

-3 

104 

58 

-62 

40 

-29 

-66 

-14 

-68 

23 

115 

89 

5 

41 

-57 

-90 

-65 

-45 

21 

41 

37 

11 

42 

-92 

-20 

-25 

-53 

-12 

-33 

20 

-3 

43 

-60 

3 

-2 

-35 

-44 

-67 

3 

10 

44 

4 

58 

-27 

11 

-12 

-88 

2 

-26 

45 

14 

33 

-29 

34 

24 

-76 

-45 

-22 

46 

53 

-36 

-55 

41 

24 

-42 

-73 

-46 

47 

11 

-73 

-83 

18 

36 

-17 

-96 

-13 

48 

33 

-96 

-40 

11 

25 

-8 

-70 

-1 

49 

-18 

-95 

29 

14 

50 

49 

-16 

-20 

50 

12 

-60 

62 

I 

6 

4 

99 

18 

-70 


18-3 



274 


Tables of autoregressive series 

Table 5. Series 6 a 


No . of 
term 

0 4 * 

50 + 

100 + 

150 + 

200 + 

260 + 

300 + 

350 + 

1 

46 

-36 ' 

-16 

-5 

147 

98 

48 

-21 

2 

-16 

55 

-71 

64 

121 

42 

16 

-14 

3 

-37 

119 

-96 

80 

-13 

-36 

-10 

-22 

4 

-19 

78 

-90 

15 

-83 

-106 

-36 

0 

5 

17 

38 

11 

-42 

-97 

-134 

-26 

51 

6 

62 

-14 , 

129 

-44 

-78 

-42 

-46 

29 

7 

51 

-35 

93 

23 

-13 

70 

-56 

-42 

8 

47 

8 

46 

104 

26 

108 

-28 

-64 

9 

-8 

71 

-28 

70 

-5 

62 . 

58 

11 

10 

-11 

40 

-79 

0 

-73 

17 

87 

96 

11 

-50 

-60 

-62 

-16 

-93 

6 

84 

48 

12 

-25 

-112 

-21 

-30 

-12 

5 

38 

-36 

13 

30 

-65 

56 

• -36 

78 

-15 

17 

-101 

14 

83 

36 

89 

19 

61 

-62 

-13 

-79 

15 

24 

69 

47 

86 

-42 . 

-14 

9 

11 

16 

-73 

62 

-37 

39 

-103 

21 

2 

124 

' 17 

-122 

-2 

-114 

-61 

-86 

67 

-53 

170 

18 

-49 

-54 . 

-93 

-47 

-24 

93 

-100 

106 

19 

47 

-76 

18 

-21 

30 

67 

-66 

-37 

20 

44 

-36 

75 

-2 

29 

-43 

32 

-111 

21 

-22 

54 

32 

48 

-37 

-55 

40 

-55 

22 

-32 

127 

2 

82 

-20 

-38 

11 

46 

23 

9 

106 

-38 

- 8 

-13 

-40 

-27 

128 

24 

6 

-29 

-56 

-50 

-10 

-37 

-4 

129 

25 

-18 

-141 

-39 

-33 

-26 

24 

18 

12 

26 

-48 

-117 

-30 

29 

21 

' 84 

-9 

-71 

27 

-72 

-18 

-33 

14 

67 

92 

-57 

-107 

28 

-14 

56 

-32 

-36 

19 

30 

-98 

-16 

29 

26 

96 

-4 

-85 

11 

-89 

-93 

100 

30 

87 

60 

17 

-63 

44 

-132 

21 

122 

31 

62 

-3 

18 

40 

0 

-69 

143 

8 

32 

-24 

-8 

42 

137 

-71 

30 

142 

-118 

33 

-20 

-48 

32 

151 

-47 

76 

53 

-179 

34 

-24 

-90 

-42 

87 

-6 

98 

-28 

-61 

35 

-12 

-100 

-111 

-39 

23 

25 

-105 

37 

36 

43 

-8 

-113 

-118 

-16 

-66 

-70 

85 

37 

101 

38 

-19 

-121 

-66 

-62 

-5 

76 

38 

37 

36 

115 

-10 

-19 

-28 

94 

26 

39 

-44 

7 

142 

91 

68 

-9 

66 

13 

40 

-34 

-16 

64 

100 

101 

-30 

36 

1 

41 

-15 

11 

-44 

10 

10 

-75 

15 

-36 

42 

10 

17 

-148 

— 58 

-24 

-59 

36 

-15 

43 

63 

-20 

-100 

-44 

-49 

-49 

65 

3 

44 

90 

-71 

26 

18 

6 

-21 

10 

23 . 

45 

86 

-40 

118 

31 

22 

44 

-28 

1'6 

46 

39 

12 

98 

-27 

-19 

28 

-39 

-26 

47 

-7 

24 

-12 

-84 

9 

-40 

■ —16 

6 

48 

-79 

2 

■ -71 

-120 

11 

-46 

19 

-8 

49 

-126 

22 

-62 

-22 

34 

-2 

6 

13 

50 

-120 

22 

-58 

102 

70 

54 

-8 

23 


M. G. Kendall 


275 


Table 6, Series 6 b 









276 


Tables of autoregressive series 
Table 7. Series 6 c 


No . of 
term 

0 + 

60 + 

100 + 

160 + 



1 

350 + 

1 

-19 

-167 

3 

-95 

10 

-67 

■1 

-21 

' 2 

-63 

-132 

-66 

-111 

46 

42 

53 

-26 

3 

-9 

-20 

-31 

-29 

61 

77 

69 

-69 

4 

8 

91 

42 

51 

38 

56 

-18 

-76 

6 

-29 

103 

43 

66 

-28 

26 

-26 

-68 

6 

1 

46 

17 

42 

-91 

8 

-64 

-23 

7 

6 

-26 

-63 

-10 

-62 

-58 

-14 

-2 

8 

35 

-no 

-108 

-21 

60 

-27 

77 

-12 

9 

34 

-112 

-77 

20 

114 

47 

61 

-23 

10 

8 

-64 

20 

72 

99 

91 

29 

1 

11 

30 

-16 

46 

62 

59 

82 

-65 

67 

12 

69 

9 

9 

-20 

1 

2 

-114 

86 

13 

85 

29 

-69 

-38 

-65 

-94 

-109 

81 

14 

-12 

32 

-27 

-36 

-72 

-96 

-4 

26 

16 

-80 

4 

3 

28 

-64 

16 

131 

1 

16 

-131 

-37 

42 

102 

-32 

64 

122 

-31 

17 

-67 

-79 

90 

109 

60 

24 

6 

7 

18 

47 

-41 

36 

36 

86 

-30 

-60 

-8 

19 

106 

-7 

4 

-60 

64 

-48 

-82 

-33 

20 

47 

-4 

19 

-108 

-40 

-7 

-62 

-24 

21 

-66 

-37 

62 

-29 

-106 

12 

41 

0 

22 

-97 

-64 

20 

93 

-122 

19 

64 

62 

23 

-89 

-2 

-16 

109 

-49 

46 

3 

93 

24 

-69 

68 

-48 

44 

23 

18 

-6 

98 

26 

7 

71 

-60 

6 

97 

21 

-2 

4 

26 

66 

27 

-17 

-4 

124 

47 

8 

-116 

27 

43 

-61 

66 

-40 

16 

-3 

60 

-136 

29 

34 

-29 

24 

-4 

-106 

-81 

36 

-58 

29 

-33 

65 

-9 

36 

-157 

-126 

33 

62 

30 

-26 

108 

-38 

59 

-55 

-114 

-41 

137 

31 

-4 

84 

-43 

85 

111 

-11 

-26 

56 

32 

36 

39 

-27 

-9 

214 

98 

-13 

0 

33 

63 

-24 

44 

-29 

173 

83 

51 

-26 

34 

68 

-87 

104 

16 

43 

46 

82 

6 

36 

24 

-43 

39 

76 

-49 

-14 

29 

49 

36 

-42 

39 

-37 

64 

-94 

-32 

7 

51 

37 

-76 

46 

-37 

-27 

-66 

-42 

-14 

46 

38 

-26 

39 

-27 

-104 

-38 

28 

-7 

-19 

39 

64 

63 

-46 

-77 

-7 

67 

28 

-60 

40 

128 

-18 

-68 

-43 

32 

78 

73 

-19 

41 

124 

-32 

3 

-31 

26 

-13 

106 

27 

42 

-3 

-8 

66 

-26 

-26 

-108 

89 

87 

43 

-122 

20 

103 

21 

-46 

-39 

43 

71 

44 

-166 

7 

44 

23 

-37 

31 

19 

0 

46 

-120 

9 

-80 

0 

11 

105 

-50 

-58 

46 

-22 

24 

-101 

-6 

69 

125 

-94 

-108 

47 

120 

23 

-63 

-20 

60 

10 

-36 

-86 

48 

176 

21 

16 

25 

. -30 

-68 

-2 

-49 

49 

48 

38 

49 

19 

-94 

-45 

11 

-2 

60 


28 

-6 

-16 

-100 

-23 

-9 

-1 






M. G-. Kendall 


277 


Table 8. Series 6 d 
















278 


Tables of autoregressive series 
Table 9. Series 7 
































































































































M. G. Kendall 


279 


Table 10. Series 8 





















280 


Tables of autoregressive series 

Table 11. Series 9 


No . of 
term 

0 + 

50 + 

100 + 

150 + 

200 + 

260 + 

300 + 

350 + 

400 + 

450 + 

1 

2-390 



- 1-528 

1-023 

- 1-549 

- 0-093 

- 0-119 

- 0-069 

0-177 

2 

0-985 

- 1-147 

-1-110 

- 1-507 

- 0-098 

- 2-845 

- 1-162 

- 0-789 

0-054 

- 0-044 

3 


ip8l& J 

1-477 

- 1-378 

- 0-499 

- 0-766 

0-734 

- 0-560 

- 0-903 

- 0-493 

4 



3-374 

- 0-164 

- 0-493 

- 0-698 

- 0-108 

- 2-073 

- 0-603 

1-064 

5 

sBifif \ 


3-173 

0-405 

1-325 

0-223 

0-755 

- 2-348 

- 0-691 

. 0-688 

6 

- 1-457 



1-080 

- 0-916 

- 0-475 

-0-022 

- 1-104 

- 1-737 

0-658 

7 

fnpn 

- 1-760 

1-895 

1-358 

1-159 

- 0-521 

1-113 

- 1-271 

- 0-908 

- 0-236 

8 


- 1-660 


0-847 

0-391 

0-798 

- 1-301 

- 0-404 

0-731 

- 1-161 

9 

- 1-567 


- 1-578 

2-333 

- 0-093 

1-567 

1-160 

0-481 

1-316 

- 1-803 


- 1-654 


- 0-467 

2-121 

- 1-850 

0-929 

1-397 

0-588 

- 0-432 

- 3-082 

11 

- 2-416 


0-875 

0-822 

- 1-059 

1-927 

1-857 

0-865 

0-241 

- 1-691 

12 

- 2-821 

SHimU 


1-245 

- 1-014 

0-203 

1-502 

1-798 

0-387 

- 1-467 

13 


- 2-360 


2-618 

-0-112 

- 1-467 

0-882 

0-663 

- 0-414 

- 2-607 

14 

- 1-616 

- 1-644 


3-134 

0-992 

- 0-550 

1-052 

0-519 

1-696 

- 2-160 

15 

-2-112 

- 1-397 

1-186 

1-582 

0-913 

- 0-411 

2-911 

- 0-064 

1-998 

- 1-505 

16 


Hi® 

- 0-430 

1-099 

2-637 

1-384 

3-643 

0-133 

0-263 

- 0-722 

17 

- 1-806 

Mm J 

HOT; i 

- 0-230 

1-982 

0-591 

1-186 

0-441 

1-270 

- 0-876 

18 

fBosEFl 

1-679 

hba i !i 

1-113 

1-315 

0-809 

1-862 

- 0-974 

0-485 

1-045 

19 


0-732 

- 1-515 

Imra: !J 

1-054 

1-306 

0-112 

- 2-054 

- 1-974 

0-950 



1-093 



0-423 

3-434 

1-032 

0-386 

- 2-867 

0-683 

21 


1-068 

- 2-501 

- 1-891 

- 0-181 

3-421 

- 0-491 

0-826 

- 2-328 

- 0-594 

22 

-0-886 




- 0-464 

2-275 

1-032 

- 1-679 

- 1-682 

- 2-596 

23 

- 1-321 


—1-121 

- 1-459 

- 0-476 

2-355 

2-384 

- 1-119 

1-102 

- 1-126 

24 


HiiSJit® 

- 2-314 


1-183 

1-366 

0-624 

- 1-946 

2-350 

- 0-665 

25 

- 2-254 

- 0-694 

- 3-101 


0-973 

1-562 

- 0-917 

- 1-392 

1-344 

- 0-823 

26 



- 2-921 

1-390 

0-801 

- 0-026 

- 1-363 

- 0-470 

1-133 

- 0-469 

27 

0-272 

- 2-704 

- 1-972 


- 1-514 

0-640 

- 1-693 

- 0-341 

1-642 

- 2-195 

28 


- 1-164 

B 81 IM 

- 1-583 

- 0-455 

0-362 

- 1-606 

- 0-470 

0-800 

0-448 

29 


- 1-382 


- 1-577 

0-204 

0-665 

- 3-104 

- 0-384 

1-977 

- 0-573 

■a 

- 0-497 

- 1-815 

- 2-635 


- 0-132 

- 0-631 

- 3-599 

1-604 

0-233 

- 0-957 

31 



- 1-375 


0-512 

- 0-877 

- 1-665 

2-262 

- 0-732 

0-114 

32 

- 0-318 

1-277 


ISHH 

0-550 

0-195 

1-684 

1-352 

0-864 

-1-102 

33 


1-496 

- 0-713 

- 2-421 

0-717 

0-565 

2-632 

0-700 

0-571 

- 1-724 

34 

1-697 

1-785 

SH® ? ii 

0-261 

0-132 

0-470 

0-471 

1-276 

1-853 

- 1-961 

35 

2-585 


1853® [ 1 ] 


0-400 

0-352 

- 1-502 

1-128 

1-589 

- 1-539 

36 

0-170 

1-615 


1-259 

- 0-241 

1-325 

- 1-703 

1-682 

0-980 

-1-111 

37 


1-734 

- 1-692 

2-917 

0-136 

1-383 

- 2-083 

- 1-061 

1-530 

- 2-282 

38 


3-764 



1-115 

0-956 

- 2-674 

0-442 

1-110 

- 0-783 

39 

1-564 

1-262 

0-527 


- 0-044 

1-191 

- 2-365 

0-482 

- 0-024 

- 0-421 

40 

1-474 

1-540 


H|||| 

0-451 

0-806 

- 2-846 

0-395 

0-693 

-2-011 

41 


■ 

0-644 


0-503 

0-978 

- 4-160 

0-722 

0-102 

- 1-903 

42 





0-215 

0-707 

- 3-199 

0-477 

1-465 

- 2-617 

43 

1-088 


0-452 

0-166 

- 0-392 

- 0-319 

- 2-662 

0-646 

0-086 

- 2-433 

44 



0-955 


2-271 

0-747 

- 4-524 

1-893 

0-265 

- 2-710 

45 




ESI? 

3-453 

- 1-327 

- 3-303 

2-771 

0-186 

- 2-601 

46 

0-486 

- 1-966 

1-319 

Ek® i # 

2-055 

1-201 

- 1-198 

1-159 

1-571 

- 2-670 

47 





2-397 

- 0-848 

- 0-766 

- 1-065 

- 0-233 

-0-866 

48 

- 1-337 




2-095 

- 0-142 

0-347 

- 2-119 

- 0-151 

- 0-016 

49 

- 1-116 

mkMm 


■ 2-390 

1-924 

- 0-749 

0-432 

- M 17 

- 0-485 

0-940 

50 

- 1-675 

1-385 

- 1-725 

2-176 

1-331 

0-167 

0-961 

- 2-044 

0-596 

0-001 


The 50 . 1 st and 502 nd terms are 0-950 and 3-505 respectively . 


























































M. G. Kendall 


281 


Table 12. Series 10 


No . of 
term 

0 + 

50 + 

100 + 

150 + 

200 + 

260 + 

300 + 

360 + 

400 + 

450 + 

1 

- 0-767 

- 1-484 

0-004 

- 4-339 

0-422 

- 6-040 

- 0-388 

— 1-465 

- 0-607 

- 0-268 

2 

- 2-015 

1-188 

3-436 

- 2-353 

- 1-673 

- 4-886 

0-067 

- 3-827 

- 0-394 

0-644 

3 

- 1-877 

2-449 

6-949 

- 0-014 

- 0-726 

- 2-132 

1-022 

- 5-830 

- 0-820 

1-531 

4 

- 2-614 

1-262 

7-956 

2-241 

- 0-877 

- 0-377 

1-069 

- 5-604 

- 2-443 

1-920 

5 

- 2-568 

- 1-597 

7-172 

3-830 

0-557 

0-130 

1-778 

- 4-520 

- 3-185 

1-111 

0 

- 2-281 

- 4-047 

3-242 

3-940 

1-443 

1-130 

0-120 

- 2-574 

- 1-651 

-0-888 

7 

- 2-797 

- 3-596 

- 1-598 

4-751 

1-216 

2-745 

0-403 

- 0-091 

1-202 

- 3-336 

8 

- 3-690 

- 2-168 

- 3-836 

5-378 

- 1-235 

3-383 

1-781 

1-776 

1-666 

- 6-307 

9 

- 4-967 

- 0-273 

- 2-545 

4-362 

- 3-025 

4-276 

3-614 

2-863 

1-473 

- 6-961 

10 

- 6-489 

0-037 

- 1-259 

3-354 

- 3-724 

3-215 

4-587 

4-060 

1-174 

- 6-971 

11 

- 5-366 

- 2-183 

0-417 

4-127 

- 2-696 

- 0-068 

4-121 

3-697 

0-141 

- 6-094 

12 

- 4-162 

- 4-064 

3-104 

5-996 

-0-112 

— 2-233 

3-291 

2-666 

1-264 

- 6-438 

13 

- 4-012 

- 4-776 

4-392 

6-116 

2-138 

- 2-833 

4-471 

0-899 

3-318 

- 4-640 

14 

- 3-934 

- 5-013 

2-849 

4-827 

5-046 

- 0-616 

6-910 

- 0-156 

3-281 

- 3-107 

15 

- 4-127 

- 2-301 

0-106 

2-022 

6-462 

1-330 

6-558 

- 0-180 

3-220 

- 1-974 

10 

- 4-196 

1-666 

- 1-617 

0-924 

5-901 

2-580 

5-618 

- 1-094 

2-386 

- 0-428 

17 

- 3-613 

3-703 

- 3-347 

0-294 

4-314 

3-479 

3-013 

- 3-168 

- 0-959 

2-407 

18 

- 1-898 

4-339 

- 4-741 

- 0-291 

2-218 

5-071 

1-537 

- 2-551 

- 5-116 

3-117 

19 

0-266 

3-979 

- 6-043 

- 2-369 

0-102 

8-250 

- 0-307 

- 0-400 

- 7-475 

1-631 

20 

0-354 

3-744 

- 6-336 

- 5-022 

- 1-461 

8-364 

- 0-074 

- 0-840 

- 7-347 

- 2-360 

21 

- 1-064 

2-373 

- 5-069 

- 6-804 

- 2-134 

7-431 

2-456 

- 1-844 

- 3-242 

- 4-538 

22 

—- 2-361 

0-948 

- 4-722 

- 3-628 

- 0-434 

5-358 

3-363 

- 3-555 

2-457 

- 4-477 

23 

- 4-320 

- 0-838 

- 6-761 

- 1-116 

1-563 

3-730 

1-554 

- 4-380 

5-668 

- 3-478 

24 

- 2-989 

- 2-423 

- 6-897 

1-976 

2-737 

1-398 

- 1-335 

- 3-517 

6-139 

- 2-057 

25 

- 0-856 

- 4-951 

- 6-678 

2-985 

0-715 

0-213 

- 3-839 

- 2-019 

6-561 

- 2-718 

26 

0-911 

- 5-398 

- 6-913 

0-712 

- 1-037 

- 0-113 

- 5-161 

- 0-933 

3-848 

- 1-514 

27 

2-411 

- 4-845 

- 8-374 

- 2-288 

- 1-294 

0-434 

- 6-862 

- 0-400 

3-429 

- 0-879 

28 

1-700 

- 4-445 

- 8-390 

- 4-965 

- 1-037 

0-003 

- 8-566 

1-630 

2-081 

- 1-167 

29 

- 0-414 

- 2-461 

- 6-417 

- 4-791 

0-018 

- 1-091 

- 7-657 

4-255 

- 0-158 

- 0-730 

30 

— 1-623 

0-792 

- 4-902 

- 3-273 

; 1-089 

- 1-006 

- 2-456 

5-218 

- 0-350 

- 1-322 

31 

- 2-176 

3-598 

- 2-896 

- 3-626 

1-905 

0-003 

3-669 

4-312 

0-265 

- 2-813 

32 

0-116 

6-347 

- 1-258 

-2 091 

1-684 

0-977 

5-724 

3-410 

2-320 

- 4-394 

33 

3-800 

6-117 

- 0-379 

0-632 

1-299 

1-426 

2-965 

2-723 

4-008 

- 4-966 

34 

4-292 

4-571 

- 1-720 

2-890 

0-346 

2-404 

- 1-364 

2-973 

4-229 

- 4-377 

36 

3-318 

4-203 

- 3-294 

5-830 

- 0-133 

3-315 

- 5-066 

0-847 

4-178 

- 4-613 

36 

1-941 

6-092 

- 2-512 

5-986 

0-796 

3-400 

- 7-564 

-0-112 

3-591 

- 3-669 

37 

2-030 

5-852 

- 0-589 

6-271 

0-898 

3-274 

- 8-163 

- 0-065 

1-837 

- 2-151 

38 

2-737 

4-931 

0-170 

3-794 

1-041 

2-796 

- 8-032 

0-380 

0-919 

- 2-642 

39 

1-186 

3-306 

1-126 

1-350 

1-199 

2-417 

- 8-919 

1-172 

0-194 

- 3-624 

40 

0-177 

2-679 

1-248 

- 1-322 

1-013 

1-967 

— S -994 

1-577 

1-219 

- 6-332 

41 

0-689 

0-802 

1-262 

- 1-963 

0-123 

0-637 

- 8-096 

1-794 

1-330 

- 6-487 

42 

0-061 

- 0-476 

1-719 

- 1-944 

1-900 

0-464 

- 8-932 

3-078 

1-108 

- 7-179 

43 

0-162 

- 1-104 

2-567 

- 1-282 

5-481 

- 1-135 

- 9-081 

5-260 

0-740 

- 7-255 

44 

0-634 

- 2-942 

3-283 

- 1-535 

7-134 

- 0-190 

— 6-721 

5-406 

1-831 

- 7-061 

45 

0-093 

- 1-681 

2-243 

- 1-055 

7-604 

- 0-489 

- 3-617 

2-252 

. 1-411 

- 5-005 

46 

— 1-551 

0-324 

0-556 

■ - 0-414 

6-782 

- 0-585 

- 0-272 

- 2-346 

0-486 

- 1-992 

47 

- 2-869 

2-456 

- 2-419 

2-462 

5-633 

- 1-148 

1-942 

- 4-823 

- 0-656 

1-252 

43 

- 4-055 

3-924 

- 4-664 

6-091 

4-136 

- 0-803 

3-233 

- 6-176 

- 0-369 

2-374 

49 

- 4-119 

2-699 

- 5-448 

5-392 

0-184 

- 0-403 

2-466 

- 4-452 

0-100 

2-935 

50 

- 3-651 

- 0-113 

- 5-168 

3-288 

- 4-711 

- 1-203 

0-307 

- 1-755 

0-250 

5-547 




282 


Tables of autoregressive series 
Table 13. Series 11 


No . of 
term 

0 + 

50 + 

100 + 

160 + 

200 + 

260 + 

300 + 

350 + 

400 + 

450 + 

1 

- 1 - 301 . 

1-170 

0-612 

— 1-339 

- 2-267 

2-498 

1-177 

- 0-871 

0-363 

— 0 076 

2 

- 1-336 

0-076 

- 0-439 

—1-221 

- 1-741 

2-166 

- 0-081 

- 1-167 

0-620 

0-114 

3 

- 1-576 

- 0-486 

- 0-305 

- 0-445 

- 0-616 

0-982 

- 0-848 

- 1-906 

0-562 

- 0-389 

4 

-1-122 

- 1-443 

- 0-782 

- 0-673 

0-862 

0-843 

- 2-310 

- 2-071 

0-961 

-0-221 

5 

0-162 

1-165 

- 0-081 

- 0-640 

- 1-180 

1-785 

- 0-003 

0-002 

- 0-628 

- 1-462 

6 

- 0-452 

0-808 

- 1-576 

0-642 

- 1-173 

1-081 

0-674 

- 1-759 

0-157 

0-848 

7 

- 1-075 

- 0-228 

0-026 

- 0-339 

- 0-415 

0-092 

0-801 

- 0-252 

- 0-138 

1-904 

8 

- 1-030 

- 0-816 

- 1-066 

1-731 

2-073 

0-293 

0-840 

0-112 

0-438 

0-800 

9 

0-178 

- 2-128 

0-996 

- 0-323 

0-656 

2-189 

0-029 

1-907 

1-097 

0-145 

10 

0-231 

- 2-151 

- 0-597 

1-207 

1-076 

1-534 

-0-001 

0-320 

0-839 

0-703 

11 

- 0-680 

- 0-541 

- 0-675 

0-795 

- 0-571 

1-171 

- 0-086 

- 1-103 

- 0-418 

0-706 

12 

- 2-478 

1-039 

1-200 

- 1-024 

' 2-018 

0-508 

- 1-190 

1-378 

0-978 

- 0-547 

13 

1-091 

- 1-203 

0-101 

— 1-198 

0-880 

0-346 

- 0-452 

2-367 

- 0-615 

- 0-550 

14 

2-273 

- 0-468 

0-470 

- 1-520 

- 0-293 

- 1-667 

- 0-444 

0-523 

-1-122 

- 0-300 

15 

2-614 

-0-012 

1-183 

- 1-053 

- 0-092 

0-285 

- 0-748 

- 0-309 

- 2-996 

0-079 

16 

2-545 

- 1-234 

- 0-604 

1-019 

- 0-271 

1-342 

-0-100 

- 0-759 

- 2-796 

- 0-427 

17 

2-644 

0-307 

- 0-125 

0-336 

0-069 

1-662 

-0-122 

- 0-571 

- 0-796 

- 0-293 

18 

0-086 

- 0-513 

- 0-464 

1-967 

- 0-784 

2-921 

0-255 

- 0-748 

- 0-130 

1119 

19 

- 1-679 

- 0-283 

- 0-774 

-1-686 

- 0-532 

2-086 

- 0-215 

0-012 

- 1-669 

-0-221 

20 

- 1-041 

0-141 

- 1-469 

- 0-291 

- 1-862 

2-610 

- 0-356 

- 0-715 

- 0-956 

0-345 

21 

- 1-309 

0-318 

- 1-032 

1-022 

- 2-056 

2-628 

1-874 



1 

22 

- 1-927 

1-169 

- 0-478 

1-817 

- 0-931 

. 2-307 

-0-111 

- 0-551 

- 0-367 

fegJiHtfJSl 

23 

- 1-462 

0-416 

- 2-042 

2 - 6 S 9 

- 1-033 

1-079 

-1-866 

ISIvY ;jj 



24 

- 1-523 

-0316 

- 0-420 

1-226 

1-356 

- 0-187 

- 0-707 


2-575 

iHiRlviJ 

25 

- 0-286 

0-372 

1-686 

1-083 

0-838 

0-323 

- 0-426 

iwlr Si 



26 

0-343 

- 0-913 

- 0-867 

- 0-299 

1-005 

1-073 

1-126 

- 0-434 



27 

- 2-558 

1-618 

0-467 

- 0-412 

0-847 

0-085 

0-587 

1-337 



28 

0-731 

0-742 

- 0-360 

1-914 

- 0-706 

0-225 

1-278 



H 8 

29 

- 0-630 

0-217 

2-127 

1-122 

- 0-410 

0-926 

0-386 



KRl 

30 

0-867 

0-979 

0-979 

- 1-742 

- 0-393 

1-236 

- 0-516 


0-764 


31 

0-666 

- 0-388 

-0-222 

- 1-146 

- 0-693 

0-453 

- 1 - 3 G 0 


- 0-500 

- 2-443 

32 

0-062 

- 1-779 

- 1-313 

0-388 

0-206 

0-604 

1-101 

1-226 

0-161 

0-028 

33 

- 0-304 

0-743 

- 0-476 

2-669 

0-808 

2-384 

1-930 


- 0-785 

Till 

34 

- 0-244 

- 0-113 

- 0-786 

1-474 

1-058 

0-791 

0-486 

0-785 

0-511 


35 

1-079 

- 1-830 

0-682 

- 0-345 

- 0-703 

1-303 


- 1-835 

0-778 

iBift m 

36 

1-140 

- 2-154 

0-730 

- 1-160 

0-161 

0-368 

1-173 


- 0-236 

0-360 

37 

- 0-457 

- 1-261 

1-439 

- 0-987 

0-434 

- 0-465 

1-895 


- 0-561 


38 

- 0-654 

0-220 

0-727 

- 0-968 

0-638 

- 0-425 

1-273 


- 0-141 

- 0-498 

39 

0-698 

0-369 

0-191 

- 2-854 

0-614 

- 2-542 

0-729 

mm 

- 0-765 

0-593 

40 

1-176 

- 1-228 

- 1-452 

- 0-357 

- 0-276 

- 1-284 



— 1-177 

-0-211 

41 

0-923 

- 0-925 

- 1-934 

- 0-483 

- 0-019 

- 2-451 



- 0-863 

- 0-009 

42 

- 1-984 

0-704 

- 0-920 

- 1-078 

- 1-858 

- 1-428 

0-333 

ISlSl§H!Iil 

0-010 

an 

43 

- 1-603 

0-205 

0-868 

- 1-335 

- 0-811 

0-169 

0-523 


- 0-620 

- 0-171 

44 

- 2-393 

- 1-136 

0-385 

0 - 58 S 

- 0-414 

1-360 

0-492 

IKfflfln 

0-385 

0-816 

45 

-2-101 

- 0-915 

- 0-658 

0-298 

- 0-257 

2-134 


- 0-293 

- 0-625 

0-907 

46 

- 1-377 

- 0-260 

0-485 

- 0-625 

1-004 

- 0-035 

- 0-184 

- 1-405 

1-051 

0-758 

47 

- 0-525 

0-373 

- 0-035 

- 0-599 

2-024 

- 0-553 

- 1-744 

- 2-076 

1 .- 41 I 

1-138 

48 

- 0-986 

0-776 

1-162 

- 0-978 

2-732 

1-849 

- 0-358 


1-312 

0-329 

49 

- 0-619 

0-926 

- 0-216 

-0-110 

2-918 

1-829 

- 1-500 


0-282 

- 0-563 

50 

- 1-290 

- 0-811 

- 2-405 

- 1-070 

3-102 

1-091 

- 2-366 


- 0-749 

1-225 


The 501 st and 502 nd terms are 0-501 and 0-445 respectively . 

































M. G. Kendall 


283 


Table 14, Series 12 


No . of 
term 

0 + 

50 + 

100 + 

160 + 

200 + 

250 + 

300 

350 + 

400 + 

450 + 

1 

- 2-394 

0-486 

- 0-917 

- 3-592 

- 4-036 

4-305 

- 0-366 

- 3-727 

2-109 

-1-211 

2 

- 3-088 

- 1-386 

- 1-734 

- 2-264 

- 1-176 

HWV1 

- 3-694 

- 4-299 

2-603 

BffiH 

3 

- 2-038 

- 0-603 

- 1-530 

- 1-334 

- 0-462 

Till 

- 3-884 

- 2-863 

1-181 

- 1-961 

4 

- 1-150 

0-838 

- 2-391 

0-306 

- 1-093 

2-243 

- 1-751 

- 2-759 

0-155 


5 

- 1-321 

0-995 

- 1-839 

0-666 

- 1-386 


0-817 

- 1-865 

- 0-558 


6 

- 1-908 

- 0-140 

- 1-893 

2-309 

1-095 

0-854 

2-614 

- 0-549 

- 0-254 

3 - 4 G 2 

7 

- 1-260 

- 2-780 

- 0-167 

1-885 

2-653 

2-364 

2-496 

2-230 

1-097 

3-962 

8 

-0-202 

- 5-139 

0-166 

2-126 

3-337 

3-707 

1-438 

3-048 

2-173 

3-379 

9 

- 0-172 

- 4-804 

- 0-409 

2-191 

1-823 

4-067 


1-075 

1-424 

2-447 

10 

- 2-566 

- 1-676 

0-067 

0-323 

2-355 

3-128 


1-036 

1-457 

mm 

11 

- 1-646 

- 0-644 

1-039 

- 1-938 

2-669 

1-753 

- 2-376 

2-070 

0-276 

- 1-273 

12 

1-746 

- 0-339 

1-280 

- 3-813 

1-344 

- 1-192 

- 2-239 

3-271 

- 1-547 

- 1-928 

13 

6-357 

- 0-063 

2-071 

- 4-279 

0-107 

KSjRifl 

IgfPSiwl 

1-805 

- 4-836 


14 

7-665 

- 1-134 

1-034 

- 1-781 

- 0-825 

gfiJS.-fil 

rig Bill. i 

- 0-409 

- 7-342 

- 1-009 

15 

8-287 

- 0-909 

- 0-023 

0-516 

- 0-892 

2-443 

- 0-437 

- 1-924 

- 6-454 

- 0-700 

16 

5-418 

- 0-946 

- 1-006 

3-426 

- 1-353 

5-686 


- 2-659 

- 3-559 


17 

0-138 

- 0-869 

- 1-869 

1-824 

- 1-574 

7-119 

0-419 

- 1-052 

- 2-357 

Emm Jl 

18 

- 3-699 

- 0-342 

- 3-022 

0-003 

- 2-917 

7-598 


- 1-532 

- 1-769 


19 

- 5-397 

0-376 

- 3-422 

0-113 

- 4-478 

7-326 

1-573 

- 0-832 

- 2-289 

Bs® ! j| 

20 

- 6-064 

1-764 

- 2-731 

1-940 

- 4-398 

6-567 

1-661 

- 0-700 

-2-001 

ill ill 

21 

- 5-434 

2-157 

- 3-336 

4-766 

- 3-632 

4-640 

- 0-826 

- 1-171 

- 0-558 

- 1-253 

22 

- 4-468 

1-181 

- 2-723 

5-499 

- 0-440 

1-633 

- 2-446 


2-961 

- 0-302 

23 

- 2-484 

0-692 

0-258 

4-749 

2-170 

-0-201 


■'fell 

4-636 

0-218 

24 

- 0-156 

- 0-852 

0-779 

2-175 

3-612 




5-621 

0-532 

25 

- 1-487 

0-386 

1-194 

- 0-394 

3-735 


1-251 

1-681 

3-188 

0-212 

26 

- 0-827 

1-691 

0-665 

0-393 

1-597 


2-967 

1-474 

0-876 

1-907 

27 

- 0-796 

1-775 

2-151 

1-752 

- 0-521 

1-313 


■Iff© 

1-365 

2-495 

28 

0-395 

2-136 

3-063 

-0-012 

- 1-765 

2-453 

1-327 

IK® 51 

1-816 

1-211 

29 

1-488 

1-074 

2-072 

- 2-034 

- 2-274 

2-494 

- 1-412 

MS 0 

0-821 

- 2-359 

30 

1-502 

-1-666 

- 0-566 

- 1-843 

- 1-413 

2-421 

- 1-116 

0-323 

0-155 

- 3-172 

31 

0-604 

- 1-626 

- 2-133 

1-658 

0-391 



1-127 

- 1-024 

- 0-940 

32 

- 0-331 

- 1-069 

- 2-850 

4-220 

2-194 

3-761 

2-593 

1-863 

- 0-693 

2-607 

33 

0-413 

- 2-193 

- 1-386 

3-468 

1-615 

3-540 

2-358 


0-527 

3-638 

34 

1-760 

- 4-032 

0-630 

0-545 

0-731 

2-381 

2-470 

- 1-463 

0-692 

3-058 

35 

1-272 

- 4-589 

2-825 

-2-122 

0-480 


3-433 

- 1-186 

- 0-064 

0-483 

36 

- 0-034 

- 2-813 

3-520 

- 3-574 

0-801 

- 1-182 

3-814 


- 0-557 

- 1-496 

37 

0-024 

- 0-430 

2-650 

- 5-725 

1-255 

- 4-039 


ESS*! 

- 1-346 

- 1-294 

38 

1-220 

- 0-295 

- 0-297 

- 4-867 

0-704 

- 6-136 

1-029 


- 2-379 

-0-886 

39 

2-253 

- 1-034 

- 3-586 

- 2-975 

- 0-472 




- 2-807 

- 0-337 

40 

- 0-116 

- 0-286 

- 4-716 

- 2-516 

- 2-729 

- 5-549 


1-168 

-1-888 

- 0-631 

41 

- 2-857 

0-407 

- 2-537 

- 2-616 

- 3-577 

- 2-895 


■ ^ 

- 1-294 

- 0-696 

42 

- 5-478 

- 0-545 

- 0-047 

- 1-031 

- 2-984 



■jfjjgl 

- 0-094 

0-366 

43 

- 6-698 

- 1-718 

0-658 

0-472 

- 1-751 


1-538 

Bff:' ff 

- 0-081 

1-657 

44 

- 6-006 

- 1-877 

1-123 

0-509 

0-570 

4-679 


- 1-696 

1-008 

2-398 

45 

- 3-783 

- 0-833 

0-921 

- 0-275 

3-626 

2-171 

- 1-412 

- 4-470 

2-561 

2-947 

46 

- 2-144 

0-798 

1-614 

- 1-535 

6-326 

1-947 

- 2-412 

- 4-342 

3-625 

2-372 

47 

- 1-086 

2-220 

1-099 

-1-66 L 

8-114 

2-886 

- 3-447 

- 2-827 

. 2-989 

0-573 

48 

- 1-413 

1-232 

- 2-004 

- 2-130 

8-864 

3-292 

- 4-952 

- 1-717 

0-726 

0-669 

49 

0-169 

0-857 

- 4-092 

- 3-769 

8-191 

3-355 

- 4-594 


- 0-772 

0-950 

50 

0-966 

-0-112 

- 4-721 

- 4-822 

6-745 

1-964 

- 3-745 

1-365 

- 1-098 

1-156 














































The 501 st and 502 nd terms are — 0’662 and — 0'337 respectively . 













































































































M. G. Kendall 


285 


Table 16. Series 14 


No . of 
term 

0 + 

50 + 

100 + 

160 + 

200 + 

250 + 

300 + 

350 + 

400 + 

450 + 

1 

- 5-539 

- 1-403 


- 1-511 

3-610 

0-158 

0-581 

1-721 

0-234 

- 1-300 

2 

- 5-006 

asSSFia 

- 0-547 

- 4-781 

3-981 

1-858 

- 0-194 

4-336 

- 0-887 

- 0-744 

3 

— 3-163 

fSSlPil 

- 0-646 

- 4-945 

4-169 

0-350 

- 1-134 

5-063 

- 0-112 

- 0-195 

4 

- 0-204 


- 0-261 

- 3-141 

4-388 

- 0-761 

- 1-382 

2-554 

0-564 

0-697 

6 

0-191 

- 1-255 

- 0-529 

- 1-972 

4-701 

- 2-444 

- 1-788 

0-030 

1-019 

0-449 

6 

0-420 

- 3-634 

- 0-421 


5-147 

- 2-883 

- 2-663 

- 2-449 

0-722 

0-861 

7 

. 1-379 

- 3-793 

0-670 


4-241 

- 1-871 

- 4-077 

- 3-514 

0-529 

2-307 

8 

2-193 

- 2-407 


1-574 

1-470 

- 1-677 

- 3-078 

- 2-165 

1-175 

3-383 

9 

1-170 

liSIBEsE! 

iHvillil 

(BSsMiil 

0-107 

- 1-421 

- 1-575 

- 2-219 

1-927 

1-643 

10 

- 0-112 


| l|al 

K | Sjj | 

- 0-506 

0-421 

- 1-647 

- 2-189 

2-386 

- 0-230 

11 

- 1-468 



|ipS| 

- 1-750 

0-480 

- 1-006 

- 2-089 

1-737 

- 2-896 

12 

- 2-674 

SSMSpI 


Hm? a 

0-223 

0-119 

- 0-229 

- 1-079 

- 0-367 

- 3-493 

13 

- 2-501 

2-167 

2-652 

iSfMj! a 

2-225 

0-773 

- 0-465 

- 0-968 

0-305 

- 2-598 

14 

- 2-281 

3-741 



1-324 

0-744 

0-992 

- 0-715 

0-661 

- 1-036 

15 

- 1-651 

2-269 

2-936 

- 4-055 

0-634 

0-520 

0-576 

- 0-966 

0-656 

0-063 

16 

- 0-638 


2-854 

- 2-264 

0-216 

1-585 

0-377 

- 1-677 

- 0-437 

1-240 

17 

0-626 

SEkSoim 

2-499 

- 0-923 

0-376 

1-411 

- 0-207 

- 2-924 

- 0-672 

- 0-391 

18 

0-172 

- 0-780 

2-093 


1-726 

0-233 

- 0-364 

- 2-738 

- 0-939 

- 1-869 

19 

0-821 

- 1-649 

SUfifli« Ifl 


4-069 

1-024 

1-470 

- 3-226 

0-806 

- 2-232 

20 

1-167 

- 2-614 


- 1-816 

4-315 

0-842 

1-856 

- 3-306 

2-895 

- 2-727 

21 

1-026 

- 1-514 

- 3-798 

n 

5-351 

- 0-546 

0-372 

- 1-107 

2-326 

- 2-404 

22 ■ 

0-160 


- 3-101 


5-597 

- 2-906 

- 1-074 

- 0-353 

0-091 

- 1-722 

23 

- 0-831 

0-183 


1-275 

4-228 

- 5-235 

- 0-138 

- 0-616 

0-080 

0-324 

24 

- 0-766 

- 1-466 

0-219 

3-674 

2-163 

- 6-456 

2-776 

- 0-221 

- 2-362 

0-357 

25 

0-130 

- 2-710 


4-683 

0-501 

- 6-323 

2-270 

- 0-066 

- 2-549 

- 0-251 

26 

- 1-331 

- 4-376 


4-192 

- 0-687 

- 3-900 

0-370 

0-217 

- 0-401 

0-138 

27 

- 0-758 

- 3-663 

- 3-236 

1-474 

- 2-697 

- 1-796 

0-073 

1-859 

1-211 

1-504 

28 

0-005 

- 1-999 

- 3-342 


- 2-575 

0-804 

. 0-585 

1-462 

0-969 

1-340 

29 

1-656 


- 2-054 

1-119 

- 1-035 

2-197 

2-906 

- 0-131 

- 0-116 

1-593 

30 

2-359 

1-151 

- 1-854 


- 1-571 

1-709 

4-519 

- 1-128 

- 2-024 

3-689 

31 

2-874 

2-387 



- 3-973 

0-392 

3-571 

- 0-034 

- 2-014 

3-304 

32 

1-122 


1-011 


- 3-956 

- 0-032 

0-939 

0-324 

- 0-580 

1-861 

33 

- 0-936 

1-966 


felrafcij 

- 2-088 

- 1-198 

- 0-625 

1-813 

1-817 

1 - 29 . 1 . 

34 

- 0-461 


1-408 

1-548 

- 0-866 

- 1-731 

- 1-466 

2-283 

3-545 

0-467 

35 

0-601 

1-275 


1-279 

- 0-547 

- 0-070 

- 1-322 

4-467 

3-932 

- 1*4 70 

36 

0-112 

1-461 

1-183 

1-326 

- 1-169 

0-339 

- 0-625 

3-325 

2-898 

- 1-653 

37 

- 0-913 

iE 


1-807 

- 1-107 

0-676 

- 0-705 

- 0-090 

2-195 

0-706 

38 

- 1-621 




0-327 

0-743 

- 1-019 

- 1-803 

2-484 

0-733 

39 

- 0-660 

- 4-255 


RiMiji 

- 0-067 

1-608 

- 2-565 

- 3-218 

1-980 

- 0-856 

40 

0-172 

mm 



- 0-475 

1-541 

- 3-883 

- 6-195 

0-147 

- 1-636 

41 

0-604 

- 2-726 

- 1-856 

■Ipl 

- 1-272 

0-698 

— 2-696 

- 5-332 

- 2-187 

- 0-854 

42 

1-492 

- 1-312 

- 1-399 

KlIMel 

- 2-709 

1-632 

- 0-944 

- 3-091 

- 2-808 

- 0-196 

43 

2-632 

- 0-336 

- 1-087 

- 1-368 

- 3-975 

1-656 

0-550 

- 0-878 

- 1-491 

1-500 

44 

0-338 


IK!! -P 

- 1-940 

- 2-749 

0-470 

1-112 

- 0-737 

0-786 

1-365 

46 

- 2-794 



- 1-468 

0-629 

0-159 

- 1-772 

- 2-859 

2-191 

- 0-723 

46 

- 3-049 

- 0-160 


- 0-270 

0-789 

- 1-976 

- 3-349 

- 3-933 

2-387 

- 2-248 

47 

- 1-367 


3-221 

mpm *181 

- 0-277 

- 1-925 

- 3-907 

- 3-279 

1-150 

- 2-684 

48 

0-193 

- 0-697 

2-630 

B9i m. 

- 1-514 

- 0-126 

- 3-360 

- 0-833 

0-613 

- 2-840 

49 

- 0-152 

- 1-443 

2-111 

3-263 

- 2-181 

1-184 

- 2-433 

2-218 

- 0-046 

- 2-444 

50 

- 1-707 

- 0-674 

1*333 

i 

3-846 

- 1-479 

1-186 

- 0-198 

1-065 

- 1-982 

- 1-605 











































286 


Tables of autoregressive series 

Table 17. Series 15 


t, 


No , of 
term 

0 H -. 

60 + 

100 + 

150 + 

200 + 

250 + 

300 + 

350 + 

400 + 

450 + 

1 - 

-0-855 

2-075 


-0-544 

-0-486 

-0-089 

0-764 

-r 0-399 

-1-032 

Bil 

mm 

WmmWl 




0-014 

sStmljEl 

HIEZill 


1 

-1-831 

mm 

tBEffil 



-0-174 

-0-677 


-0-794 


1112111 

SHE:! ■ 

HI 

jBBSei 

0-061 



0-416 


■afiaai 

-1-566 


tHra : * m 

5 

1-259 

-0-004 

-0-965 

ffefgjwl 

0-586 

2-122 

0-462 

-2-577 

0-736 

gllSli 1 

6 

0-129 

-0-867 

-1-428 


0-676 

ill 

CTISiWl 

-1-579 

Isis ‘ji 

ksb h I 

7 

-0-476 

-1-817 


HWkEI 

-0-591 

3 

1-336 

W^rpFl 

if SI 

Hif :i 1 

8 • 

-0-810 

0-970 

-1-698 

1-143 

0-472 


■B 



1-684 

9 

-0-306 

0-773 

k&hei 

1-236 

0-668 


1-207 



1-821 

10 

0-173 

0-541 

-0-871 


0-049 

-1-147 


s||| 



11 

-0-801 

0-494 

Spill 

Si!!li 

0-586 


-1-216 




12 

-0-220 

-0-342 



0-105 


-1-249 

0-392 


Bau+lR 

13 

--1-618 

-0-060 

iililiiSl 

O'a^rrl 

1-116 

2-150 

-1-696 



2-015 

14 

-1-287 

-0-093 

0-170 

-0-976 

0-188 

-1-157 



-1-159 

-0-905 

15 

-0-032 

1-034 

1-433 


-0-410 


-1-579 

-1-212 

2-104 

JSUBTtGlI 

16 

-0-571 

0-246 

-1-273 

1-714 

' 0-794 

-0-511 

1-242 



igfftll 

17 

1-816 

~ 0-193 

-1-877 

-0-508 

0-574 



-1- 335 

iBSlS 1 


18 

0-342 

0-507 


1-063 

0-977 

n 

0-160 


iB®' 3 


19 

0-620 

0-325 

-0-033 

1-193 

0-640 

-0-801 

-1-077 

0-851 

0-015 

—1-121 

j 20 

0-406 

-0-040 

0-274 

2-321 

0-241 

-0-199 

-1-159 

0-772 

0-202 

1-217 

2.1 

-1-088 

0-311 


0-097 

-0-602 



-0-080 

0-402 

1-649 

22 

-0-106 

-0-314 


0-455 

0-525 

-1-104 


0-863 

0-183 


23 

-0-205 

0-741 

-0-271 

-1-979 

1-219 


0-154 

0-685 


-0-960 

24 

-0-417 

2-126 


0-621 

0-069 



-0-363 

-0-777 

-0-235 

25 

-0-544 

-0-835 


1-357 

-0-012 


1-136 

1-064 

0-967 

-0-436 

26 

0-651 

-1-895 

-0-117 

0-396 

-0-520 

0-181 

0-254 

-1-096 

1-182 

-0-120 

27 

0-719 

0-037 

-0-485 

-0-368 

1-384 

0-460 

0-227 

-0-316 


1-201 

28 

-1-160 

-0-068 

-0-549 

-1-309 

1-984 

B' 

0-712 

-1-149 

-1-687 


29 

0-396 

1-332 

-0-237 

-0-290 

-1-423 

ol. Si 


0-313 

-1-123 


30 

0-394 

0-157 


-0-102 

-1-370 


-0-166 

-0-734 

0-449 


31 

0-460 

-0-181 

0-569 


-0-077 

-0-939 

-0-787 

1-414 

0-664 


32 

0-341 

-0-271 

1-470 

1-916 

1-032 

0-127 

-0-385 

-1-084 

0-676 

2-151 

33 

-0-549 

-0-730 

1-052 

-0-687 

1-015 

WmwE * 

0-727 

-1-149 

-0-275 


34 

-0-536 

—1-176 

0-127 

DM 

1-232 

BSlii 


-0-930 

1-308 

■Eill 

36 

-1-602 

1-663 

1-389 

EmSIsi 

-0-808 

BblM 

-0-370 

0-557 


1-848 

36 

-0-042 

1-645 

0-098 

-0-759 



1-277 

0-179 


1-186 

37 

-0-013 

-0-326 

0-356 

-0-565 

nE !fil 


0-190 

1-018 


\ 11 

38 

-0-051 

-0-067 

-1-255 

-1-378 

1-998 


-0-472 

-0-861 

Bp®m 


39 

1-128 

0-505 

0-303 

0-052 


-2-273 


-2-163 

Bl'iSi t 

-0-048 

40 

1-635 

0-505 

1-444 

1-183 

1-312 


0-783 

0-756 

-0-482 


41 

-1-567 

-0-639 

II 

1-848 



-0-706 

-0-647 

-1-108 

-0-328 

42 

1-396 


1-345 




1-226 

0-732 

!, 1 

0-104 

43 

1-783 

IBSIki 

1-117 

■ 


KfoBiM! 

1-922 

1-447 

Elf - 


44 

1-125 

-1-148 

-1-120 

BE i 

BaSinF 

BSstll* 

0-211 

1-270 



46 

1-029 

0-618 




!BSw» 

0-245 

-0-055 

ISii® " 

i 0-518 

46 

0-002 

-0-148 



1-367 

-2-728 

-1-504 

0-964 



47 

0-362 

-0-109 

1-131 

-0-813 

-0-926 

-1-953 

0-369 

-0-373 


I^EdSlI 

48 

1-398 




-2-743 

Hill 

-0-853 

-1-207 

Bt ! & 

-1-359 

49 

0-329 


0-850 




0-308 

0-902 

0-456 


50 

-0-602 

HE 

■Hi 

0-665 

wm 

as 

1-359 

-0-441 

m 

-1-006 


The 501 st and 502 nd terms are 0-663 and 0-452 respectively . 

















































































M. G. Kendall 


287 


Table 18. Series 16 


No . of 
term 

0 - f - 

60 + 

100 + 

150 + 

200 + 

260 + 

300 + 

350 + 

400 + 

450 + 

1 

H 

2-563 

- 0-784 


ini 

0-782 

1-212 

2-643 

- 4-404 

- 1-514 

2 

3-961 

2-048 

- 1-840 

- 0-842 


- 0-229 

1-060 

0-389 

- 4-962 

- 0-623 

3 



bmurM 

- 0-439 

toll ®: 1 

1-479 

1-022 

- 3-470 

- 2-520 

- 0-478 

4 

2-641 

- 1-487 

- 3-376 


HI ® f \ 

2-379 

- 0-085 

- 5-591 

0-545 

0-548 

5 



Sfittiil 

0-864 

Hpfip \ 

1-475 

0-732 

- 4-392 

2-663 

-0-112 

6 

- 1-705 

- 2-287 

- 3-348 



0-155 

1-083 

- 2-209 

3-704 

1-287 

7 

- 2-375 

0-076 

- 0-134 

3-301 

H , W . • 

- 2 - 173 . 

2-033 

- 0-352 

3-730 

3-293 

8 

- 1-587 

1-768 


2-950 

RiH I 

- 3-615 

1-701 

- 0-099 

1-326 

2-896 

9 

- 1-369 

2-401 



1111 ® ! r 

- 2-885 

- 0-361 

1-063 

- 0-677 

0-508 


- 0-922 

1-415 

-0-668 

3 

Si 

- 1-086 

- 2-497 

1-611 

- 0-981 

- 1-862 

11 

- 1-952 

mm 

IfffSI 

-1-101 

1-569 

2-399 

- 4-262 

0-638 

- 0-995 

- 0-276 

12 

- 2-974 

- 0-464 


- 2-360 

1-479 

2-024 

- 2-711 

- 0-333 

- 1-703 

- 0-283 

13 

- 2-327 


1-829 

- 1-439 

BBffiicI 

0-329 

- 2-430 

- 1-897 

. 0-662 

- 1-242 

14 

- 1-644 


0-592 

1-311 

Skill 41 

- 1-161 

- 0-076 

- 1-416 

1-912 

-1-111 

15 

1-171 

0-696 

- 2-140 


mtm 

- 1-954 

2-836 

- 1-942 

2-794 

- 1-123 

16 

2-452 

0-720 

- 2-421 

2-218 

1-745 

- 1-682 

3-317 

- 1-781 

2-191 

- 0-742 

17 

2-732 

0-819 

-1-626 

2-804 

2-092 

- 1-674 

1-154 

- 0-137 

1-028 

- 1-376 

18 

2-186 

■HI 

IrawBUsl 

4-298 

1-670 

-1-200 

- 1-548 

1-512 

0-237 

0-075 

19 

- 0-051 

0-452 


3-422 

0-189 

- 1-420 

- 3-281 

1-651 

0-149 

2-419 


- 1-314 

- 0-067 

1-317 


-0-102 

- 2-066 

- 2-276 

1-924 

0-228 

2-651 

21 


mffll 

§§tf?fEi 


1-012 

- 0-536 

- 0-709 

1-975 

0-780 

0-740 

22 

- 1-548 

2-645 

BH 

- 1-968 

1-224 

0-167 

0-471 

0-848 

- 0-034 

- 0-740 

23 

- 1-434 

1-864 

1-152 

-0-102 

0-828 

- 1-633 

2-009 

1-009 

0-540 

- 1-622 

24 

- 0-152 

- 1-178 

1-224 

1-208 

-0-221 

- 1-693 

2-228 

- 0-410 

1-793 

- 1-534 

25 

1 - 268 . 

- 2-180 

0-286 


0-727 

- 0-586 

1-074 

- 1-272 

1-069 

0-325 

20 

0-311 

- 1-884 

- 0-847 

- 0-758 

2-894 

0-673 

1-440 

- 2-343 

- 1-307 

1-269 

27 



- 1-311 


1-397 

0-524 

0-066 

- 1-628 

- 3-096 

0-985 

28 

0-353 

1-487 

- 1-673 

- 1-562 

- 1-280 

0-686 

- 0-813 

- 1-354 

- 2-303 

1-067 

29 


1-278 


- 0-472 

- 2-184 

- 0-447 

- 1-714 

0-739 

- 0-421 

-0-121 

30 




2-173 

- 0-730 

- 0-708 

- 1-864 

0-406 

1-364 

1-489 

31 


- 0-944 

3-176 

1-939 

1-304 

0-181 

- 0-467 

- 1-072 

1-436 

1-056 

32 




- 0-975 

3-031 

0-309 

1-474 

- 2-312 

2-206 

1-470 

33 


- 0-517 

2-848 

- 2-957 

1-874 

- 0-529 

1-485 

- 1-460 

1-130 

2-937 

34 

- 2-609 

2-182 

1-846 

- 3-524 

1-190 

- 1-019 

2-173 

- 0-260 

- 0-794 

3-681 

35 

- 1-461 

2-334 

0-963 

- 2-963 

2-232 

- 0-729 

1-838 

1-457 

- 2-613 

0-759 

36 


iUMR!u$l 

- 1-119 

- 2-875 

3-858 

- 0-223 

0-463 

0-872 

- 3-121 

- 1-468 

37 

1-416 


- 1-409 

- 1-629 

2-959 

- 2-154 

0-051 

- 1-933 

- 1-857 

- 2-042 

38 

3-393 

0-876 

0-453 

0-828 

2-638 

- 2-989 

0-607 

- 1-806 

- 0-964 

- 1-349 


1-468 

- 0-164 

3-249 

3-674 

2-078 

— 4-117 

- 0-063 

- 1-667 

- 1-240 

- 0-791 


1-314 


4-693 

3-877 

2-361 

- 3-747 

0-864 

- 0-199 

- 2-515 

- 0-092 

41 

2-495 


4-654 

2-884 

1-805 

-2-668 

2-892 

2-062 

- 2-841 

- 0-795 

42 

3-212 

- 1-908 


Miil.LI 

1-509 

- 0-920 

2-966 

3-637 

- 1-908 

0-040 

43 

3-315 

- 1-189 

- 0-613 

- 2-624 

0-535 

0-748 

2-061 

2-915 

- 0-707 

0-960 

44 



- 0-862 

ismg.nM 

1-200 

- 1-446 

- 0-720 

2-352 

- 0-444 

0-122 

45 

HKf-S 


■SHI 

- 3-353 

0-127 

- 3-917 

- 1-453 

0-757 

- 0-304 

- 0-292 

46 

1-412 


1-859 

rm$m 

- 3-203 

- 3-804 

- 2-092 

- 1-551 

- 0-119 

- 1-741 

47 

1-412 

- 1-547 

2-675 


- 3-135 

- 1-377 

- 1-266 

- 1-182 

0-477 

- 1-774 

48 

0-345 

- 1-264 

2-299 

0-838 

- 2-173 

0-555 

1-012 

- 0-960 

1-107 

— 2-087 

49 

1-749 

- 2-379 

0-647 


- 0-912 

2-062 

1-347 

- 1-504 

1-277 

- 0-746 


2-864 

- 1-177 

BHI 

HU 

0-069 

2-761 

1-904 

- 1-732 

- 1-009 

0-675 


iq 


Biometrika 36 







































































288 


Tables of autoreyressive series 


Table 19. Values of c k ami r,Jor series 1 



8 

9 
10 
11 
12 

13 

14 

15 

10 


1,217,170 
927,053 
458,107 
90,479 
-80,933 
-95,200 
-47,312 
_ 8,895 
20,685 
21,168 
-42,728 
-123,352 
-173,350 
-153,577 
-82,428 
34,070 
82,284 



1-000 
0-763 
0-378 
0-080 
-0-007 
-0-079 
-0-039 
-0-007 
0-022 
0-018 
-0-030 
-0-104 
-0-140 
-0-130 
_ 0-063 
0-029 
0-070 


k 


17 

18 

19 

20 
21 
22 

23 

24 

25 
20 

27 

28 

29 

30 

31 

32 

33 


ck 

r k | 

k 



70,902 

0-006 

34 

54,121 

0-048 

37,770 

0-032 

35 

9,302 

0-008 

-8,109 

-0-007 

36 

-41,850 

-0-031 

-29,750 

-0-020 

37 

-52,745 

- 0-041 

-6,035 

-0-005 

38 

-26,534 

-0-024 

22,188 

0-019 

39 

-19,038 

-0-011 

12,301 

0-011 

40 

-33,-766 

-0-030 

-12,771 

-0-011 

41 

-43,076 

-0-039 

-47,794 

-0-041 

42 

-13,075 

-0-012 

-91,940 

-0-080 

43 

37,682 

0-034:. 

-125,573 

-0-109 

44 

61,317 

0-056 

-146,002 

-0-127 

45 

53,467 

0048 

- 134,177 

-0-117 

4(1 

44,736 

0-041 

-86,031 

_ 0-075 

47 

45,666 

0-041 

-20.509 

-0-018 

48 

38,801 

0-036' 

44,838 

0-030 

49 

24,436 

0-022 

77,980 

0-0(10 

50 

7,831 

0-001 


Table 20. 


Values of c k and r k for series 2, 3 and 4 



Sari'* ^ 

1 h 1 

e* 1 

0 

836,617 

1 

709,079 

2 

506,325 

3 

291.381 

4 

104,87(1 

6 

-36,275 

6 

-102,830 

7 

-121,940 

8 

-114,325 

9 

-76,997 

10 

-40,795 

H. 

-20,232 

12 

-11,782 

13 

-7,589 

14 

-32,356 

15 

-61,104 t 

16 

-102,551 

17 

-131,650 

18 

-130,400 

19 

-120,194 

20 

-78,730 

21 

-23,001 

22 

38,088 1 

23 

76,008 

24 

108,169 

25 

141,980 

26 

137,720 

27 

99,392 

28 

72,308 

29 

71,157 

30 

68,511 


n- 


l-ouo 
0-852 
0-010 
0-353 
0-128 
-0-044 
-0-126 
-0-160 
-0-142 
_ 0-09(1 
-0-068 
_ 0-033 
-0-015 
-0-010 
0 - 0-11 
-0-078 
0132 
- 0-169 
-0-177 
-0-156 
-0-103 
-0-030 
0-050 
0-099 
0-144 
0-190 
0-185 
0-134 
0-098 
0-097 
i 0-094 


I 

i 


Si-rii-H 3 


'•* 


039,563 

580,821 

-97,646 

- 563.764 

- 549,785 i 

- 164,843 1 

235.473 1 
362,975 | 

221,837 t 

- 10.213 ' 
-178,603 

- 200,113 
119.942 
-8,147 

111.156 j 

107.156 
140.275 

48.177 

— 511,858 
...125.095 

-88,301 
21,297 ! 

122,829 J 
157,567 i 
81,340 | 

- 63,776 t 

- 171,262 

- 108,798 
-06,814 

(13,524 

147,371 


l ** 


1 -000 
0-621 
-0-105 
— 0-608 
- 0-595 
-0-179 
0-257 
0-398 j 
0-244 j 
- 0-011 1 
-0-198 
-0-223 i 
-0-134 
-0-009 I 
0-126 
0-190 
0-160 
0-055 
- 0-069 
-0-145 
-0-103 
0-025 
0-144 
0-185 
0-090 
-0-070 
-0-204 
- 0-202 
-0-081 
0-077 
0-179 


Scji-ies 4 


Ok 


480,312 

1-000 

-327,751 

-0-685 

83,209 

0-175 

124,239 

0-262 

-195,590 

-0-414 

142,973 

0-304 

- 59,201 

-0-126 

-3,399 

-0-007 

7,358 

0-016 

29,175 

0-063 

- 54,608 

-0-110 

48,217 

0-105 

- 16,371 

-0-036 

-8,168 

-0-018 

-2,033 

-0-006 

19,702 

0-044 

-21,054 

-0-048 

-2,210 

-0-005 

30,250 

0-068 

-41,995 

-0-095 

20,538 

0-060 

0,283 

0014 

-28,023 

-0-064 

40,618 

0-107 

-51,382 

-0-119 

22,938 

0-053 

42,239 

0-099 

-90,303 

-0-212 

90,937 

0-214 

-33,782 

-0-080 

-26,722 

-0-061 






M. G. Kendall 

Table 21. Values of c k and r k for series 5 and 6 


289 


Series ... 

5 a 

56 

5 c 

6 d 

5 (in toto ) 

°0 

977,837 

958,436 

879,736 

872,844 

3 , 688,853 

«1 

739,993 

702,480 

607,461 

619,552 

2 , 672,905 

n 

0-7568 

0-7329 

0-6905 

0-7098 

0-7246 

C 2 

364,400 

282,372 

168,243 

223,038 

1 . 043,493 

r 2 

0-3727 

0-2946 

0-1912 

0-2565 

0-2829 

C 3 

58,786 

- 00,840 

- 128,204 

- 98,570 

- 221,266 

r 8 * 

0-0001 

- 0-0636 

- 0-1457 

- 0-1129 

- 0-0600 

04 

- 90,575 

- 243,083 

- 198,042 

- 276,361 

- 804,161 

u 

- 0-0028 

- 0-2636 

- 0-2251 

- 0-3166 

- 0-2180 

Series 

6 a 

G 6 

6 c 

6 d 

6 (in toto ) 

«0 

1 , 551,606 

1 , 505,293 

1 , 548,545 

1 , 186,705 

6 , 792,149 

Ol 

924,569 

903,589 

951,497 

750,232 

3 , 531,836 

ri 

0-5959 

0-6003 

0-6144 

0-6322 

0-6098 

C a 

- 270,379 

- 220,471 

- 194,686 

- 48,120 

- 720,610 

r, 

- 0-1743 

- 0-1465 

- 0-1257 

- 0-0405 

- 0-1244 

c 8 

- 1 , 021,112 

- 927,370 

- 976,079 

- 604,650 

- 3 , 509,827 

r 8 

- 0-6581 

- 0-6161 

- 0-6303 

- 0-5095 

- 0-6060 

«4 

- 872,747 

- 793,867 

- 901,618 

- 645,513 

- 3 , 206,192 


- 0-6626 

- 0-5274 

- 0-5822 

- 0-5440 

- 0-5535 


Table 22. Values of c k and r h for series 8, 10 ,12, 14 and 16 


h 

Series 8 

Series 10 

Series 12 j 

Series 14 

Series 16 


r k 

C; t 


n 

n 


i-k 


»■* 

0 

12 , 406-47 

1-000 

6414-470 

1-000 

3346-111 

1000 

2116-829 

1-000 

1672-212 

IB 

I 

11 , 751-63 

0-949 

6707-452 

0-892 

2815-257 

0-843 

1707-930 

0-809 

1253-272 

0-751 

2 

10 , 182-50 

0-824 

4131-158 

0-647 

1714-249 

0-514 

973-967 

0-462 

489-083 

0-294 

3 

8 , 303-08 

0-673 

2407-876 

0-378 

650-804 

0-196 

358-001 

0-170 

- 133-027 

nmiimil 

4 

6 , 590-10 

0-636 

1004-645 

0-167 

- 49-417 

- 0-015 

2-327 

0-001 

- 387-373 

- 0-234 

5 

6 , 299-99 

0-432 

302-223 

0-048 

- 408-133 

- 0-123 

- 132-306 

- 0-063 

- 311-083 

- 0-188 

6 

4 , 410-17 

0-360 

39-600 

0-000 

- 548-680 

-0166 

- 143-107 

- 0-058 

- 80-278 

- 0-052 

7 

3 , 865-31 

0-316 

90-769 

0-014 

- 532-595 

- 0-161 

- 143-734 

- 0-069 

126-455 

0-076 

8 

3 , 502-73 

0-287 

274-386 

0-043 

- 305-760 

- 0-120 

- 144-603 

-0 069 

245-460 

0-149 

9 

3 , 211-80 

0-264 

471-979 

0-076 

- 188-798 

- 0-057 

- 116-614 

- 0-056 

271-914 


10 

2 , 959-40 

0-243 

624-129 

0-099 

36-728 

0-011 

- 49-845 

- 0-024 

197-670 

0-121 

11 

2 , 767-02 

0-228 

725-064 

0-116 

189-907 

0-058 

13-060 

0-000 

80-693 

\'Y, SjJ 

12 

2 , 660-60 

0-220 

771-678 

0-123 

231-281 

0-071 

28-263 

0-014 

- 11-174 

P®) ■ 

13 

2 , 627-86 

0-217 

768-202 

0-123 

182-206 

0-056 

- 4-922 

- 0-002 

- 49-764 


14 

2 - 632-93 

0 - 2.18 

682-319 

0-109 

150-430 

0-046 

- 61-316 

- 0-030 

- 30-803 

IS? 3! 

16 

2 , 639-50 

0-219 

550-040 

0-088 

175-750 

0-054 

- 119-775 

- 0-058 

- 8-137 


16 

2 , 632-87 

0-219 

396-900 

0-064 

196-577 

0-061 

- 147-510 

- 0-072 

- 9-094 

SSflTs s 

17 

2 , 570-85 

0-216 

221-944 

0-036 

192-876 

0-060 

- 148-196 

- 0-073 

- 26-163 

- 0-016 

18 

2 , 426-49 

0-203 

67-235 

0-011 

190-296 

0-059 

- 102-048 

- 0-050 

- 63-165 

- 0-033 

19 

2 , 201-66 

0-184 

- 40-242 

- 0-007 

219-116 

0-068 

- 21-133 

- 0-010 

- 64-609 

- 0-034 

20 

1 , 924-49 

0-162 

- 131-413 

- 0-021 

308-266 

0-096 

60-353 

0-030 

- 23-027 

- 0-014 

21 

1 , 622-42 

0-137 

- 216-876 

- 0-035 

410-601 

0-130 

141-538 

0-070 

69-210 

0-043 

22 

1 , 300-73 

0-110 

- 294-638 

- 0-048 

497-343 

0-155 

181-089 

0-090 

169-082 

0-106 




- 354-742 

- 0-068 

501-373 

0-167 

159-227 

0-079 

226-251 





- 416-650 

- 0-068 

453-351 

0-142 

114-061 

0-057 

229-216 

■3EH 


347-31 

0-029 

- 465-492 

- 0-076 

357-632 

0-113 

52-435 

0-026 

194-919 

0-123 

26 

150-48 


- 430-809 

- 0-071 

199-598 

0-063 

- 13-199 

- 0-007 

140-516 

Hr Tfl 

27 

23-52 


- 272-333 

- 0-045 

- 24-634 

- 0-008 

- 20-853 

- 0-013 

51-111 


28 

- 63-64 


- 6-663 

- 0-001 

- 221-975 

- 0-070 

17-009 

0-00 9 

- 88-074 

B 

\m 



282-596 

0-047 

- 292-376 

- 6-093 

89-160 

0-045 

- 240-431 


30 

- 241-18 

Bill 

503-641 

0-084 

- 248-584 

- 0-079 

164-033 

0-082 

- 330-713 

- 0-210 


































[ 290 ] 


TABLES FOR USE IN COMPARISONS WHOSE ACCURACY 
INVOLVES TWO VARIANCES, SEPARATELY ESTIMATED 

By ALICE A. ASPIN 


Introduction 


The tables are designed for use when the precision of an estimate, y, of a population para¬ 
meter, ■rj 1 depends linearly on two population variances, erf and cr|, the sampling variance of 
y being therefore of the form (Ajuf + A 2 <r!), where A x and A 2 are known positive constants. 
If s\ and s| are estimates of erf and a\, based on A and/, degrees of freedom, respectively, then 
the tables give, for two probability levels, critical values of the ratio 

(y-v)l\KKsi + ^A)- 


The tables are based on the assumptions of normal theory, the methods used in their 
calculation having been described by B, L. Welch (1947) and A. A. Aspin (1948). In par¬ 
ticular, it is assumed that ,s\ and s| are distributed independently of each other and of y. 

Even under the restrictive assumptions of normal theory, it is not possible to table critical 
values of v which are fixed independently of all the sample statistics. Eor a given probability 
level, e, these critical values depend on A and/ 2 and also on the ratio 

c = A 1 sf/(A 1 s|+A 2 sl), 

i.e. on the relative magnitudes of the observed sample variances. The function to be tabled, 
therefore, involves four arguments and may be written 


v 



A t sf 

A t sf + A 2 s| ’ 



It has the property that the probability, 


Pr. 


(y-v) 


a/(A i*i +A 2 is 2 ) 


’*{ a . 


/a> 


Ai^i 


Aiyf + A a a| 


, e | , 


( 1 ) 


equals e, whatever may be the values of the unknown population variances, erf and cr\. 

Two probability levels (e = 0'05 and e = 0-01) are tabled on separate pages, the object in 
each case being to cover, in a small compass, a wide range of the other arguments involved. 
Direct linear interpolation should be used everywhere except, for each/, in the panel which 
includes / = oo. Here harmonic linear interpolation will be needed. 

Some remarks on the accuracy of the tables and of certain approximations, which may be" 
usedfor other probability levels, are made later in an Appendix by B. L, Welch (in particular 
see §6). 


Example 


If (a x — * 2 ) is the difference between the means of two normal populations whose standard 
deviations cannot reasonably be assumed equal; and if (aq — x 2 ) is the difference between the 
two means of samples of sizes % and n z taken respectively from the populations; then we 
may put 


V (' s iH +' s !/ re 2 ) 


V! 


« i/«i 


A — ( n i ~ 1) an d 


( A ^- fA ^ i ) 
tf 2 = (%- 1 ), 


(xfln-t + s\jn % 



Alice A. Aspen 


291 


Table 1. Value of v = 


(y-v) 


V(A x s|+A a s|) 

[or of | v | exceeded with probability 2e 


exceeded with probability e = 0-05* 
<H0] 



Ais“ 

0-0 

0-1 

0-2 

0-3 

0'4 

0-5 

0-6 

0-7 

0-8 

0-9 

1-0 

Ajfif + AaSj 

A= c 

A= o 

1-94 

1-90 

1'85 

1-80 

1-76 

1-74 

1-70 

1-80 

1-85 

1-90 

1-94 


8 

1-94 

1-90 

1-85 

1‘80 

1-76 

1-73 

1-74 

1-70 

1-79 

1-82 

1-86 


10 

1-94 

1-90 

1-85 

1-80 

1-76 

1-73 

1-73 

1-74 

1-76 

1-78 

1-81 


15 

1-94 

1'90 

1-85 

1-80 

1-70 

1-73 

1-71 

1-71 

1-72 

M3 

1-76 


20 

1-94 

1-90 

1'85 

1-80 

1-70 

1-73 

1-71 

1-70 

1-70 

1-71 

1-72 


CO 

1-94 

1-90 

1-85 

1-80 

1-76 

1-72 

1-09 

1-07 

1-66 

1-65 

1-04 

A = 8 

A = 6 

1-86 

1-82 

1-79 

1-70 

1-74 

1-73 

l-7’6 

1-80 

1-85 

1-90 

1-94 


8 

1-86 

1-82 

1-79 

1-76 

1-73 

1-73 

1-73 

1-70 

1-79 

1-82 

1-80 


10 

1-86 

1-82 

1-79 

1-76 

1-73 

1-72 

1-72 

1-74 

1-76 

1-78 

1-81 


15 

1-86 

1-82 

1-79 

1-76 

1-73 

1-71 

1-71 

1-71 

1-72 

M3 

1-75 


20 

1-86 

1-82 

1-79 

1-76 

1-73 

1-71 

1-70 

1-7.0 

1-70 

1-71 

1-72 


oo 

1-86 

1-82 

1-79 

1-75 

1-72 

1-70 

1-08 

1-66 

1-05 

1-65 

1-04 

/a=10 

A= 6 

1-81 

1-78 

1-76 

1-74 

1-73 

1-73 

1-76 

1-80 

1-85 

1-90 

1-94 


8 

1-81 

1-78 

1-76 

1-74 

1-72 

1-72 

1-73 

1-76 

1-79 

1-82 

1-86 


10 

1-81 

1-78 

1-76 

1-73 

1-72 

1-71 

1-72 

1-73 

1-76 

1-78 

1-81 


16 

1-81 

1-78 

1-76 

1-73 

1-72 

1-70 

1-70 

1-71 

1-72 

1-73 

1-76 


20 

1-81 

1-78 

1-76 

1-73 

1-71 

1-70 

1-09 

1-69 

1-70 

1-71 

1-72 


oo 

1-81 

1-78 

1-70 

1-73 

1-71 

1-09 

1-07 

1-60 

1-05 

1-65 

1-64 

h= 16 

A= 6 

1-75 

1-73 

1-72 

1-71 

1-71 

1-73 

1-70 

1-80 

1-85 

1-90 

1-94 


8 

1-75 

L73 

1-72 

1-71 

1-71 

1-71 

1-73 

1-70 

1-79 

1-82 

1-80 


10 

1-76 

1-73 

1-72 

1-71 

1-70 

1-70 

1-72 

1-73 

1-70 

1-78 

1-81 


15 

1-75 

1-73 

1-72 

1-70 

1-70 

1-69 

1-70 

1-70 

1-72 

1-73 

1-75 


20 

1-75 

1-73 

1-72 

1-70 

1-69 

1-69 

1-09 

1-09 

1-70 

1-71 

1-72 


OO 

1-75 

1-73 

1-72 

1-70 

1-68 

1-67 

1-00 

1-65 

1-05 

1-05 

1-64 

A=20 

A= 6 

1-72 

1-71 

1-70 

1-70 

1-71 

1-73 

1-70 

1-80 

1-85 

1-90 

1-94 


8 

1-72 

1-71 

1'70 

1-70 

1-70 

1-71 

1-73 

1-76 

1-79 

1-82 

1-80 


10 

1-72 

1-71 

1-70 

1-69 

1-69 

1-70 

1-71 

1-73 

1-76 

1-78 

1-81 


15 

1-72 

1-71 

1-70 

1-69 

1-69 

1-69 

1-69 

1-70 

1-72 

1-73 

1-75 


20 

1-72 

1-71 

1-70 

1-09 

1-08 

1-68 

1-68 

1-09 

1-70 

1-71 

1-72 


OO 

1-72 

1-71 

1'70 

1-08 

1-07 

1-06 

1-66 

1-05 

1-05 

1-05 

1-34 

A = co 

A= 6 

1-64 

1-65 

1'06 

1'07 

1-09 

1-72 

1-76 

1-80 

1-85 

1-90 

1-94 


8 

1-64 

1-65 

1'05 

L66 

1-68 

1-70 

1-72 

1*76 

1-79 

1-82 

1-80 


10 

1-64 

1-65 

1-65 

1-00 

1-07 

1-69 

1-71 

1-73 

1-70 

1-78 

1-81 


15 

1-64 

1'65 

1-06 

1-65 

1-00 

1-67 

1-68 

1-70 

1-72 

1-73 

1-75 


20 

1-64 

1-65 

1'05 

1-65 

1-60 

1-00 

1-67 

1-08 

1-70 

Ml 

1-72 


OO 

1-64 

1-64 

1-64 

1-64 

1’64 

1-04 

1-64 

1-64 

1-64 

1-64 

1-04 


* y is normally distributed about y ■with variance A 2 crj), and and s\ are independent esti 

mates of crj and cr^, based onfj_ and/ a degrees of freedom, respectively. A x and A 2 are known constants. 

In the problem of comparing the means of samples taken from two normal populations, put y = (x 1 -x Si ) 
/j = (n 2 - l),/ 2 = (n 2 -l), A t = 1/% and A a = l/rc 2 , where % andw 2 are the sample sizes. 




292 


Comparisons whose accuracy involves two variances 


Table 2. Value, of v = 


(y-y) 


■ exceeded with probability e - 0-01* 


a /(A 1 Si +A 2 s|) 

[or of | v | exceeded with probability 2e = 0-02] 



A A 

0-0 

0-1 

0'2 

0-3 

04 

0-5 

0-6 

0-7 

0-8 

0-9 

14 

A l 5j+A,4 

A =10 

fy =10 

2-76 

2-70 

2-63 

2-56 

2'61 

2-50 

2-51 

2'56 

2'63 

2-70 

2-76 


12 

2-76 

2-70 

2-63 

2-56 

2-51 

2-49 

2-49 

2-62 

2-57 

2-62 

248 


10 

2-76 

2-70 

2-63 

2-56 

2-61 

2-48 

247 

2-48 

2-52 

2-56 

2-60 


20 

2-76 

2-70 

2-63 

2-66 

2-51 

2-47 

2-45 

2-45 

2-47 

249 

243 


30 

2-76 

2-70 

2-83 

2-56 

240 

2-46 

243 

2-42 

242 

244 

246 


CO 

2-76 

2-70 

2-63 

2-66 

2-50 

244 

240 

2-36 

2-34 

2-33 

2-33 

A= 12 

A =10 

2-68 

2-62 

2-67 

2-62 

2 49 

2-49 

2-51 

2-56 

2-63 

2-70 

2-76 


12 

2-68 

2-62 

2-57 

2-62 

2-48 

247 

248 

2-62 

2-57 

242 

248 


15 

2-68 

2-62 

247 

2-52 

248 

2-46 

246 

248 

242 

2-56 

2-60 


20 

2-68 

2-02 

2-57 

2-52 

2-48 

245 

2-44 

245 

2-47 

249 

2-53 


30 

2-68 

2-62 

2'67 

2-52 

247 

244 

242 

241 

242 

244 

246 


oo 

2-68 

2-62 

2-57 

2-51 

246 

242 

2-38 

2-36 

244 

2-33 

2-33 

ft —15 

A=io 

2-60 

2-56 

242 

2-48 

247 

248 

2-61 

2-56 

2-63 

2-70 

2-76 


12 

2'60 

2-66 

2-52 

2-48 

246 

246 

2-48 

2-52 

2-57 

242 

2-68 


15 

2-60 

2-66 

2-51 

2-48 

245 

246 

245 

248 

241 

246 

240 


20 

2-60 

2-56 

2-51 

2-48 

245 

243 

243 

244 

2-46 

249 

2-63 


30 

2-60 

2-56 

2-51 

2'47 

244 

242 

241 

241 

242 

244 

246 


CO 

2-60 

2-56 

2-51 

2-47 

2-43 

2-40 

2-37 

2-35 

2-34 

2-33 

243 

> 

11 

60 

O 

A=io 

2-63 

2-49 

2-47 

2-45 

2-45 

2-47 

2-51 

2’66 

2-63 

2-70 

2-76 


12 

2-53 

2-49 

2-47 

2-46 

244 

245 

248 

2-52 

247 

242 

248 


15 

2-53 

2-49 

2-40 

2-44 

243 

243 

245 

248 

241 

246 

2-60 


20 

2-53 

2-49 

2-46 

2-44 

2-42 

242 

242 

244 

246 

249 

243 


30 

2-63 

2'49 

2-46 

2-44 

2-42 

240 

240 

2-40 

242 

243 

2-46 


co 

2-63 

249 

2-46 

2'43 

2-40 

2-38 

2-36 

2-34 

2-33 

2-33 

2-33 

/a = 30 

A=io 

2-46 

2-44 

2-42 

2-42 

243 

246 

240 

2-56 

243 

2-70 

2-76 


12 

2-46 

2-44 

2-42 

2'41 

2-42 

244 

247 

2-62 

247 

242 

248 


15 

2-46 

2-44 

242 

2-41 

€•41 

242 

244 

247 

241 

246 

240 


20. 

2'46 

2-43 

2-42 

240 

2-40 

240 

242 

244 

2-46 

249 

243 


30 

2-46 

2-43 

2-42 

2-40 

2-39 

2-39 

2-39 

2-40 

242 

2-43 

2-46 


cO 

2-46 

2-43 

2-41 

2-39 

2-37 

2-36 

2-35 

2-34 

2-33 

2-33 

2-33 

1 ! 

8 

A=io 

2-33 

2-33 

2-34 

2-36 

240 

244 

2-50 

2-56 

243 

2-70 

2-76 


12 

2-33 

2-33 

2-34 

2-36 

2-38 

242 

246 

241 

247 

242 

2-68 


15 

2-33 

2-33 

2-34 

2-35 

2-37 

240 

243 

247 

241 

2-56 

240 


20 

2-33 

2-33 

2-33 

2-34 

2-36 

2-38 

240 

2-43 

2-46 

2-49 

243 


30 

2-33 

2-33 

2-33 

2-34 

2-35 

2-36 

2-37 

2-39 

241 

243 

2-46 


. 

2-33 

2-33 

2-33 

2-33 

2-33 

2-33 

2’33 

2-33 

2-33 

2-33 

2-33 


* y is normally distributed about y with variance A /(/\. 1 cr[ + A. 2 o^), and s® and Sj are independent esti¬ 
mates of erf and cr“, based on f x and/ 2 degrees of freedom, respectively. and A 2 are known constants. 

In the problem of comparing the means of samples taken from two normal populations, put y = (x x — ® 2 ) : 
fi — («i — I)? jf 2 — ( w j- 1), = 1 /% and A 2 = l/n 2 , where n x and n 2 are the sample sizes. 




Alice A. Aspin 293 

where a| and s| are the sample estimates of variance. The tables may then be used to make 
inferences about (oq - a 2 ). 

Thus if n x = 10, n 2 = 15, x x = 73-4, x 2 = 47-1, = 51, and s| = 141, we shall have 

($!-»*) = 26-3, A = 9,/ 2 = 14 and 




26-3-(aq-a,,) A X «S 
3-81 ’ (A lS f + A 2 s|) 


0-352. 


From the tables 


« /i»/a» 


K4 


A^f-t-A^l’ 



u{9,14,0-352,0-05} = 1-71. 


If it were a matter of testing the consistency of the data with the hypothesis that a 1 = a 2 , 
we should have v = 26-3/3-81 = 6-90, which is clearly far beyond the tabled 5 % point. 

In obtaining an interval estimate for (a x — oc 2 ) we should note that in the above description 
of the tables we have been dealing explicitly with one-sided probabilities. The chance that 
v exceeds the tabled value numerically, either in the positive or negative direction, is 2e. Thus 

Pr. 1-71 < 2 -- ~ ( g~* a) < Ml] = 0-90, 
i.e. a Pr. [19-8 < (a 1 — a 2 )< 32-8] = 0-90, 


i.e. the 90 % confidence limits for (aq-a 2 ) are 19-8 and 32-8. 


APPENDIX 

Further note on Mrs Aspin’s tables and on certain 

APPROXIMATIONS TO THE TABLED FUNCTION 
By B. L. WELCH 

1. More than one solution has been proposed for the problem of dealing with comparisons 
whose estimated precision involves two separate estimates of variance. The present tables 
are based on a solution which relies only on probability calculations of the so-called ‘direct ’ 
variety. I have given elsewhere a method of developing this solution in a power series in 
1 lf { (Welch, 1947), and further terms in the series have been given by A. A, Aspin (1948). 
In calculating the tables, Mrs Aspin has in general utilized the series as far as terms of order 
I//?, and, in certain doubtful oases, has also evaluated terms of order ljfj. Taking the series 
thus far, it is possible to give v to two decimal places down to the lowest values of A and/ 2 
shown in the tables, but, for smaller A and / 2 , two-decimal accuracy could not, in general, 
have been guaranteed. For the larger values of A and A more figures could, of course, have 
been given, but there are advantages, in a table of this character, in keeping to the same 
number of figures throughout. 

It is natural to ask what degree of accuracy in the probability is implied by the use of 
values of v rounded off to two decimal places. For A and f 2 large, this question can easily be 
answered. For then v tends to be normally distributed, and, on the normal curve, two- 
decimal accuracy in the deviate, in the vicinity of e = 0-05, implies three-decimal accuracy in 
the probability of the deviate being exceeded. In the vicinity of e = 0-01, two-decimal accuracy 
in the deviate im plies errors in the probability of not more than 2 units in the fourth decimal 
place. 



294 


Appendix to Mrs Aspirus paper 


For lower values of/ x and/ a it is not possible to be dogmatic, although it is very often the 
case, when we are using series of the present type, that the greater the struggle needed to 
obtain extra decimal places in the deviate, the less will be the effect upon the probability itself 
of any rounding-off error in the deviate. Although I believe this to be true in the present 
instance it has seemed to me, nevertheless, advisable to make some direct calculations, using 
quadratures, to test out the question thoroughly. Accordingly for the lowest values f x = / 2 = 6 
given in the table for e = 0-05,1 have made some calculations of this nature and willpresent 
the results below. It will be convenient at the same time, for the same particular case, to 
give the results of similar calculations which throw light on the accuracy of certain simple 
approximations to the tabled function. • 


2. If we are confining ourselves'to a given pair of values, / x and/ 2 , and to a given pro- 

A t sf 


bability level, e, the function v\ /i, / 2) - 


e need, momentarily at least, be regarded 


’A 1 5f+A 2 s|’ 

only as a function of the one argument c = A 1 s^/(A 1 s|+A 2 s|) and may conveniently be 
denoted simply by v(c). Since it is not the function v(c) itself that we are concerned with, but 
the tabular values obtained after rounding off v(c) to two decimal places, it will be convenient 
to have a further symbol v T {c), say, to denote these rounded-off values. In the first line of 
Table A we reproduce from Mrs Aspin’s Table 1, the values v T (c) for the case =/ 2 = 6, 
e = 0-05. Let us now consider the probability that v exceeds v T (c). 


Table A. Critical values ofv (/ x = / 2 = 6; e — 0-05) 


C — 

0-0 or 1-0 

0-1 or 0-9 

— 

0-2 or 0-8 

— 

0-3 or 0-7 

04 or 0*6 

0-6 

Tabled values v T (c) 

1-94 

1 90 

1-85 

1-80 

1-76 

1-74 

Approximation v^u) 
Approximation m x (c) 

1-94 

1-88 

1-84 

1-81 

1-79 

1-78 

1-94 

1-87 

1-82 

1-79 

_ 

1-77 

1-76 


If Pr. {v > v T {c)} depends on <r\ and <y\ at all, it must depend on them through the ratio 
y = \<xll(X- L crl + A 2 cr|). If, then, we take different values of y, and calculate for each one, 
directly by quadrature, the value Pr. {v>v T (o)} t we shall have solved our problem. This 
calculation by quadrature is heavy, although it is lightened considerably by the use of some 
rather neglected tables of the probability integral of ‘ Student’s ’ distribution (Gosset, 1925). 
It is not proposed to enter into the details of the calculations, but it should be mentioned 
that a rather broad network was used in the quadrature and that the last decimal place given 
below cannot, on this account, be fully guaranteed. 

The following results were obtained. For y = Off, 0-2, 0-3, 0-4 and 0-5 the respective 
values for Pr. {-« > v T (c)} were 0-0501,0-0500,0-0500,0-0498 and 0-0498. It is clear, therefore, 
that to three decimals we certainly have Pr. {v > j> y (c)} = 0-050, whatever y, and indeed that 
the error can, at most, be only a unit or two in the fourth decimal place. It seems, therefore, 
safe to assume that throughout Table 1 the rounding off to two decimal places in the deviate 
leaves us easily with three decimal places in the probability, 0-050, that the deviate is 
exceeded. Similarly in Table 2 we can almost everywhere expect that the probability that v 
exceeds the tabled values will be 0-0100 to four decimal places. 



B. L. Welch 


295 


3. The provision of tables of v covering a large number of probability levels would clearly 
be a major task, and possibly one which would not justify the labour involved. It is therefore 
of some importance to consider the order of accuracy attainable with any approximation 
which, only utilizes existing tables. A simple approximation of this character is to take for 
the critical v the value given in a ‘Student’ t table corresponding to degrees of freedom F, 


where 




Consideration of the series development of the 1 Student ’ deviate shows that this procedure 
is legitimate if terms of order 1 jf \ can he neglected. 

Numerically, in the particular case = / 2 = 6 with e = 0-05, this approximation is equi¬ 
valent to taking for v(c) the values « 1 (c) shown in the second line of Table A, By direct evalua¬ 
tion, using quadratures, it is possible now to obtain the probability that v > «q(c). Results of 
this calculation, for different y, are shown in the second line of Table B. In this case an error 


Table B 


y= 

0-0 or 1-0 

0-1 or 0-9 

•0-2 or 0-8 

0-3 or 0-7 

0-4 or 0-6 

0-6 

Pr.{r>Uj.(e)} 

0-060 

0-060 

0-060 

0-050 

0-050 

0-060 

Pr. (v>v 1 (c)} 

Pr. {« > v[(c)} 

0-060 

0-061 

0-060 

0-046 

0-048 

0-048 

0-050 

0-062 

0-051 

0-05) 

0-050 

0-050 


of one or two units in the third decimal place of the probability is apparently possible with 
this approximation. 

As/x and / 2 increase, the approximation using the ‘Student’ table improves, and for 
practical purposes this procedure is therefore extremely useful. The approximation can also 
be used in conjunction with the above-mentioned tables of the probability integral of the 
‘Student’ distribution (Gosset, 1925) to give actual probabilities, if it is felt that reference 
to a few percentage points is not sufficient. 

4. To order 1 // t , the above-described procedure will not be altered if we enter the ‘ Student ’ 
t table with some other number of degrees of freedom, F' (say), which differs from F only to 
order l//'f. A possible modification (see Welch, 1947, p. 32, equation (29)) is to take F' given by 

!/(*' +2) = Wr + 2) + (l- C ) 2 /(/ a + 2)}. 

A form like this is suggested by the behaviour of the successive groups of terms in the true 
series solution. An expansion of this series in terms of l/(/ i + 2), l/(/ i + 2) (^ + 4), etc., 
instead of powers of 1 lf t appears to introduce some simplification into the algebra. Numerical 
calculations, however, show this gain to be largely illusory (Aspin, 1948), and the corre¬ 
sponding ‘Student’ approximation, using F', not to have the advantages expected. 

In the third line of Table A are shown the values rj(c) which result when F' is used, and the 
third line in Table B shows values of Pr. {v > t4(c)} calculated by quadrature. In the neigh¬ 
bourhood of y = \, F' has some advantage, but, taking the overall picture, the accuracy 
given is of the same order as that given by using the F of the previous section. Since F is, 
perhaps, rather simpler to use, we have therefore no very good reason to introduce a modi¬ 
fication of the present nature. 




296 


Appendix to Mrs Aspin's paper 


5. Approximations, utilizing tables of the ‘Student’ distribution and agreeing with the 
true solution to terms of order 1 //for higher, can be given. The labour involved in using them 
however, begins to approach, that involved in working out the true series solution term by 
term, and there seems, therefore, little point in describing them. 


6. Summary. On the basis of the above calculations for the case/j =f 2 = 6 and the known 
behaviour of the distribution of v when/t and/ 2 are large, it seems safe to say, that, although 
1 the values of v given in Table 1 are rounded off to two decimal places, the probability of the 
values thus tabled being exceeded will always equal 0-060 to three decimal places. Usually 
indeed, the accuracy of the probability will be even better than this. 

Similar considerations for Table 2 suggest that the probability of the given values of v 
being exceeded will almost everywhere equal 0-0100 to four decimal places. 

The above calculations also throw some light on an approximate method which consists 


in taking for the required v, the values in a ‘ Student ’t table corresponding to F degrees of 
freedom, where i/j> = {c a // 1+ (! ~c) 2 // 2 } and c = K4IM+^A)- 


If this approximation be used for values of the arguments not covered by Mrs Aspin’s 
tables, or in cases where actual probabilities are required rather than references to fixed 
Probability levels, it usually gives good results. 


REFERENCES 

Aspin, A. A. (1948). An examination and further development of a formula occurring in the problem of 
comparing two mean values. Biometrika, 35, 88-96. 

Cosset, W. S. (1925). New tables for testing the significance of observations. Matron , 5, 105. (Re. 

printed in ‘ Student’s ’ Collected Papers, Cambridge University Press, 1942.) 

Welch, B, L. (1947). The generalization of ‘Student’s’ problem when several different population 
variances are involved. Biometrika, 34, 28-35. 



[ 297 ] 


BIVARIATE DISTRIBUTIONS BASED ON SIMPLE 
TRANSLATION SYSTEMS 

By N. L. JOHNSON 


1. Introduction 


In a previous paper (Johnson, 1949) systems of frequency curves based on transformations 
of the form , 

2 = = y+Sf(y), 


where z is a unit Normal variable, /is a function, and y, 8, E and A are parameters, have been 
discussed. The special systems 

S L corresponding to f(y) = logy (‘logNormal’), 


/^corresponding to f{y) = log 


v 

i ~y‘ 


and S n corresponding to f{y) = log [y+>J(y i + 1)] 

were considered in some detail. 

The present paper is concerned with certain bivariate distributions which can be con¬ 
structed on the basis of univariate distributions of systems S L , S B or S a , together with the 
Normal distribution, denoted below as S N . 

If it be supposed that the unit Normal variables 

h = Yi + 5 i/i(A). H = 7* + (1) 

have the joint Normal bivariate distribution 

P{Zv z z) = [27r ^(1. — p 2 )] -1 exp { — ^(1. — p 2 ) -1 (^x — 2pz 1 z a + z|)}, (2) 

the joint distribution of y y and y % is thereby determined. While we cannot, of course, say that 
the joint distribution of z y and z 2 must be of form (2), it is an assumption which should be 
reasonable in many cases. With equal cogency, therefore, it can be argued that the joint 
distribution of the variables y 1 and y 2 , defined by (1), should be that implied by (2). 

The transformations for y y and y 2 need not belong to the same system. Thus, restricting our¬ 
selves to the Normal system, S N , and the systems S L , S B and we have ten different systems 
of bivariate distributions for y y andy a . We will denote these systems by symbols S NN , S NL , 
S LB , etc. S N jv is the bivariate Normal distribution; S LJj is the ‘logarithmic surface’ con¬ 
sidered by Wicksell (1917); S NL is the ‘semi-logarithmic surface’ considered by Yuan (1933). 
Allowing for choice of dependent and independent variable, there will be sixteen types of 
regression fine (one being the linear regression corresponding to the bivariate Normal case). 


2. Comparison with other bivariate systems 

It is of interest to compare the principle employed above (transformation to Normality) 
with other principles which have been suggested as starting points in the construction of non- 
Norrnal bivariate distributions. Steffensen (1922,1941) suggested that the joint distribution 
p{y i, y 2 ) might be supposed to be of the form 

(h («i2/i + + b 2 y 2 ), 



298 


Bivariate distributions based on simple translation systems 

where a 1; a 2 , b 1; b 2 are constants and g v g 2 , are univariate probability density functions. This 
is equivalent (Frechet, 1941) to supposing that independent .‘primary variables’ can be found 
which are linear functions of the y’ s. K. Pearson (1923), in a survey of methods of construc¬ 
tion of bivariate distributions, states that this method does not accord well with observed 
distributions. In the same paper, reference is made to a number of bivariate distributions 
arising from special cases of a generalization of the differential equation which generates the 
univariate Pearson system of curves (see also Rhodes, 1922). Reference is also made to con¬ 
temporaneous work by Narumi (1923), wherein bivariate distributions are derived from 
certain limitations on the shape of the regression lines and on the type of heteroscedasticity 
allowable. Pretorius (1930) showed that Narumi’s assumption of constant p 1 and /? 2 in the 
array distributions is not justifiable in much observational data. Pretorius also considered 
bivariate distributions based on Charlier curves, on Edgeworth’s method of translation, 
and on a development of our system S LL . Recently Van Uven (1947, 1948) has developed 
a complete system of surfaces generalizing the Pearson univariate system. 

3. General properties or the S zj systems 

(a) Array distributions. The system S XJ is defined by the equations 

z 1 = Y 1 +SJ 1 (y 1 ), z 2 = y 2 + 8Jj(y 2 ) (3) 

(/j and fj referring to the systems S x and Sj respectively, I, J = V, L, B or 17), and by the 
correlation, p, between zyand z 2 . 

If jqbe fixed, so is' z v For a fixed value of z v z 2 is distributed Normally with expected value 
pz 1 and standard deviation (1 -p 2 )h Thus, for y 1 fixed, y 2 + 8Jj{y z ) is distributed Normally 
with expected value p{y 1 + 4/r(2/i)} and standard deviation (1 — p 2 )h Hence 

{Yi+VM-pivi+tJMl (i’- pV 

is a unit Normal variable. Thus, given y v y 2 has a distribution of the same system, Sj, as its 
marginal distribution, with 

y 2 replaced by [y a - p{y 1 + SJM}] (1 - p 2 )“* 
and S 2 replaced by 4(1 -p 2 ) _i . 

Since ,f x {yi) varies from - oo to +oo, it is clear that (except when p = 0) the quantity 
replacing y 2 will change sign at some point in the range of variation of y x . If Sj be a sym¬ 
metrical transformation then the array distributions to either side of this value of y 1 will be 
skew in opposite directions. Variation in skewness of this kind was present in some of the 
examples used by Pretorius. It may, of course, so happen that the transition point is so far 
out in the tail of the y x distribution that the reversal effect is negligible. 

It may also be noted that the quantity replacing 4 does not vary with y v If the dis¬ 
tribution of y 2 be S B or S n , the (/4, fi 2 ) points of the array distributions all fall on a constant 
8 line; if the distribution of y 2 be S L , the (/? 1; /4) point does not vary, so that the array dis¬ 
tributions are all of the same shape, though they are not liomoscedastic. 

A case of particular interest arises when the marginal distribution of y 2 is in the system S B . 

The array distribution of y 2! given y v will be bimodal provided 


T2-p{yi + ^/r(2/i)} 

,V(l-p 2 -25i) 24 

/l-p 2 -24 

V(l-p 2 ) 

< K VO-P’I V 

1-p 2 


(cf. equation (26) of Johnson (1949)). 


(4) 



N. L. Johnson 


299 


We observe that 

(i) If p % > 1 — 2 8\, then none of the array distributions are bimodal. 

(ii) There are always some unimodal array distributions. Bimodal array distributions, 
if any, correspond to values of y x in a single finite interval. 

(b) Median regression. Since the array distribution of y 2 , given y v is in the system Sj to 
which the marginal distribution of y 2 belongs, it follows that the expected value of y„, 
given y v may be expressed as an explicit function of y 1 using formulae derived by Johnson 
(1949). This would give the regression curve of y 2 on y v The expression obtained would, 
however, in general, be complicated and unsuited to further analytical study. 

Much simpler and more informative formulae are obtained if we write down the expression 
for the median value of y 2 , given y v The possibility of using the median regression was men¬ 
tioned by Narumi (1923), though he did not pursue the idea. 

The median value y 2 of i/ a , for a fixed value of y u satisfies the equation 

y 2 + s JAVi) = p\yr +4/i(2/i)L 

i- e - fAHt) — {PYi~Yz) ^2 Vr^i)’ ( 5 ) 

The median regression will be linear if p8 x = 8 V py x = y 2 and f T =fj. The only other 
case giving linear regression is f r {y) = fj(y) = y which is 8 NN , the Normal bivariate 
case. 

Table 1 shows the median regression equations for the sixteen possible cases covered by the 
S B , S Ut S L and Normal marginal distributions. For convenience the parameters 

Q = e <pyi-ri)lh i (/, - pSJS^ (6) 

Table 1 

Median regression of y t on y 1 
ih 

Vi = log 6+^ 

\ogd+<j)\agy x „ 

log (9+95 log {y 1 /( 1—^1} 

1/(1+ 0 _1 e - *" 1 ) 
log0+^log[i/ 1 + V(2/(+ 1)] 

\(0 _ 0-i e -M) 

Gyt 

1/(1 + 6~hj[*) 

%x + V( 2/?+l)l* 

{i+6-y(yl+i)- yi f}- 1 
m? - 0-H.i- Vi) 2 *} vA( 1 - 

+ Avl + Vf- o~y(vl +1) - 2 / 1 W 

The general appearance of these curves is indicated in Figs. 1—9 (those involving Normal 
marginal distributions in respect to either y 1 or y 2 are omitted). It will be noted that in many 
cases the general slope of the regression depends on the range of values in which (f> lies. The 
case cf>< 0 will arise when p < 0, as we have assumed throughout this paper that 8 is 
positive. 


have been introduced. 

Distribution of 


y 2 

Vi 

s x 

8m 

s„ 

Sj 

s L 

8m 

s x 

Sj 

8, 

S N 

Sy 

So 

Sv 

Sy 

Sl 

Sj 

Sj 

Sm 

Sj 

s L 

Sj 

So 

s„ 

Si 

Sm 

Sj, 

Sj 

So 

s a 

Sj, 

So 

So 




N. L. Johnson 301 

4. Percentage zones in the 8 xj systems 

It is, of course, a simple matter to construct regions in the {y lt y, t ) plane which will contain 
any desired proportion of the frequency distribution of y 1 and y 2 . If 7c be the number defined by 

1 C ,c 

■ i,n v e~ iP dt = a, 

then, for a fixed value of y v ' n ^ ~ k 

~ h- [y 2 -p{yi+ sjAv i)}] (-1 -p 2 ) x , Mya-p{yi+*iA(2/i)}] (i-p 2 )-* 

d 3 (l -p 2 )-i <SAVi)< 8 2 (l-p*)-i 

Hence the curves ±i(l -p^ + Yt + ^fM = P[j^ +hfiiVi)} (7) 

will enclose a zone containing a proportion a. of the total frequency surface. This zone may 
he regarded as a ‘percentage zone’ for y 2 given y v It will be noted that the boundary curves 
(7) will be of the same type as the median regression of y 2 on y v They will have the same 
“</>’( = pSJS^) as the median regression, but the values of ‘ 6 ’ will be 

exp [K l {PYi - Yi ± H 1 - P *)*}] • 

to 

Naturally there will be another similar zone containing a proportion a of the total fre¬ 
quency forming a ‘percentage zone’ for y x given y 2 . 

A third region containing a given proportion of the total frequency, which is some interest, 
is that obtained from the transformation of the corresponding y 2 ellipse: 

z\ - 2 pz x z 2 + zf = const. (8) 

Substituting from (1) in (8) will give the equation of the boundary of a region in the (y v y 2 ) 
plane which will contain the same proportion of total frequency as does the y 2 ellipse in the 
(h, z a ) P lane - 

5. Method oe fitting 

The most straightforward way of fitting a j oint distribution of the kind proposed above seems 
to be as follows: 

(i) Pit each marginal distribution by a curve from one of the four systems (Normal, 

(ii) Calculate the observed correlation between the transformed variables, using the results 
of (i) to obtain the latter. 

The functions/ z and and the values of all the parameters except p, are determined by (i); 

(ii) determines p. 

The example below illustrates this method of fitting. It will be noted that in this example 
the higher order moments of the arrays are not well reproduced by the fitted surface. It is 
well to remember, however, that Pretorius (1930) obtained results which were scarcely as 
good when he fitted a surface of Jorgensen’s Type AA to the same data. The equation of this 
surface contains fourteen arbitrary constants, while the S uv surface fitted in the example 
below depends on only nine arbitrary constants. 

Example. In examples 3 and 4 of Johnson (1949), curves of system S v have been fitted to 
distributions of length and breadth, respectively, of 9440 beans. We now proceed to fit a 
surface of system .S uv to the joint distribution of length and breadth. The correlation between 
the transformed variables 




. , , /length-16-0745\ 1 . /breadth-8-6195 

smh' 1 —- and smh- 1 -wvr,- 


1-5192 


“ 


r 


0-9721 





y $ 2 of arrays of breadth of beans Mean breadth of beans in 


302 


Bivariate distributions based on simple translation systems 


Regression 
Breadth an length 



Fig. 10. 


Scedasticlty 



Length of beans In mm. 

Fig. 11. 


Skewness 
Breadth on length 



Kurtosls 

Breadth on length 



Length of beans In ram. 


Regression 
Length on breadth 



Scedasticlty 
Length on breadth 



Breadth of beans In mm. 


Fig. 13. 


Fig. 14. 


Fig. 15. 


Skewness 


c 

Length on breadth 

of be 

/ 

o c 

c?v a 


-C 

/ 0 

1 -0-4 

/ o 

o - 0-2 

o o 

SL 

£ o-o 

° +0'2 

/ 0 0 

-1_i— • i 


9*0 8*5 8-0 7*5 7*0 
^ Breadth of beans In mm. 


Kurtosls 

Length on breadth 



Fig. 16. 


Fig. 17. 












N. L. Johnson 303 

is found to be 0-746. Hence, denoting length by x 1 and breadth by x v the constants of the 
fitted S ou surface are 

71 = 2-38, = 2-64, 4 = 16-0745, A t = 1-5192, 

72 = 2-13, d 2 = 3-55, 4=8-6195, A a = 0-9721, 

p = 0-746. 

Consider now the array distribution of x % (breadth) for a fixed value of x 1 (length). This will 
be a curve of system Su with parameters 

7(*i|*t) = {ya-pyi-Miamh-^^^ijjtl-pa)-!, ^x^) = 4(1 -p 2 )"*. 

Hence, inserting numerical values, 

8{x 2 | aq) - 5-3307 

and fl(& 2 1 *i) — y(® 2 1 *i)/4* a 1 *i) 

= 4 _1 (y a - py i) - fA 4" 1 sinh- 1 

= 0-104-0-5548 sinh- 1 

\ 1-5192 / 

Also, of course, g(» a j aq) = 4 = 8-6195, A(* 2 1 aq) = A 2 = 0-9721. 

Using these values of the parameters the expected value, standard deviation, /? x and /? a 
of the array distributions of breadth were calculated. The results are shown graphically in 
the continuous curvesdn Figs. 10-13. The ringed dot3 show the corresponding values cal¬ 
culated from the observed array distributions by Pretorius. Figs. 10-13 are to be compared 
with that author’s diagrams VI (a)-(d); to facilitate the comparison they have been 
constructed in a similar manner. 

For array distributions of aq (length) for a fixed x 2 it was found that 

$(*i | * 2 ) = 3-9642, 

fl(*i|* 2 ) = 0-300 -1-0031 sinh- 1 

§(aq | x 2 ) = 4 = 16-0745, A(aq | x 2 ) = A x = 1-5192. 

Figs. 14-17 are analogous to Figs. 10-13 and compare curves giving the theoretical values of 
constants of the array distributions of x 1 with the observed values (cf. Pretorius’s diagrams 
Vl(e)-W). 

From a study of the eight diagrams (Figs. 10-17) it appears that agreement between theory 
and observation becomes less satisfactory as higher moments come into consideration. The 
regression lines are quite close fits. The scedastio lines, while not giving a close fit, seem to he 
fairly reasonable, and are about as good fits as the Type AA lines shown by Pretorius. In 
the case of the lines showing skewness and kurtosis the most that can be said is that they 
do indicate the general trend and order of the observed values. A peculiar feature is that it 
seems that in all four cases the fit of these lines would be improved if they were displaced to 
the right (i.e. in the direction of decreasing length or breadth). 

It should be noted that Pretorius did not calculate the 4?i an d coefficients for the 
arrays of the Type AA surface. 

Biometdka 36 10 



304 


Bivariate distributions based on simple translation systems 

REFERENCES 

Frechet, M. (1941). Skand. Aktuar. 24, 214. 

Johnson, N. L. (1949). Biometrika, 36, 149, 

Narumi, S. (1923). Biometrika, 15, 77. 

Pearson, IC. (1923). Biometrilca, 15, 222, 231. 

Pretorius, S. J. (1930). Biometrilca, 22, 109. 

Rhodes, E. C. (1922). Biometrika , 14, 355. 

Steffensen, J. F. (1922). Slcancl. Aktuar. 5, 73. 

Steffensen, J. F. (1941). Shard. Aktuar. 24, 1. 

Van Even, M. J. (1947). Ned. Akad. Wet. Proc. 50, 1063, 1252. 

Van Uven, M. J. (1948). Ned. Akad. Wet. Proc. 51, 41, 191. 
Wicksele, S. D. (1917), Ark. Alai, Ast. Fys. 12, no. 20. 

Yean, P. T. (1933). Ann. Math. Statist. 4, 30. 



[ 305 ] 


A TEST FOR RANDOMNESS IN A SEQUENCE OF TWO 
ALTERNATIVES INVOLVING A 2x2 TABLE 

By P. G. MOORE, University College, London 
1 . Introduction 

Recently the power function technique has been used to examine tests of randomness applied 
to a sequence of two alternatives. E. N. David (1947) has considered the power function of 
what may be termed the ‘ group ’ test, and G. I. Bateman (1948 a) has similarly considered the 
‘longest run’ test. In both cases the hypothesis tested, say H 0 , is that there is randomness 
within the sequence against the admissible alternative, H v of dependence of the kind found 
in a simple Markoff chain. We propose here to deal with a slight variation of the group test 
which leads to a 2 x 2 table. We shall then examine the power of this test against the alter¬ 
native of positive dependence in the sequence of the type met with in a simple Markoff chain. 
This will enable us, before some set of data is collected, to find approximately how large a 
sample is necessary in order to be able to detect a specified degree of dependence using a given 
significance level. 

In the sequence of alternatives we write 1 for the happening of a certain event and 0 for 
its negation. For example, in a sequence of tyres coming off a manufacturing plant, a 1 might 
indicate a tyre rejected as not being up to standard. We would be interested here in seeing 
whether the rejected tyres were randomly placed in the sequence or whether there was an 
undue clustering effect of the rejected tyres. Taking successive couplets along the sequence, 
the usual procedure is to represent the result in a 2 x 2 table using the notation given in 
Table 1. The value of N will obviously be one less than the number of items in the sequence 
and also r and m will not differ by more than unity. Our test will be concerned with the form 
of this table under (i) randomness and (ii) dependence of the simple Markoff type. 


Preceding 

trial 


Table 1 

Present trial 



1 

0 

Total 

1 

a 

C 

m 

0 

b 

d 

n 

Total 

r 

e 

N 


2. Presentation oe data 

The preceding manner of arranging the data has been used, among others, by Cochran (1938). 
He then compared the two proportions ajm and bjn, using the standard test for this kind of 
data. However, for a given sequence, the values in the 2x2 table formed are not uniquely 
fixed by the numbers of l’s and 0’s in the sequence. To clarify the position we may consider 


20-2 




306 


Randomness in a sequence of two alternatives 

how the various oases arise. Suppose that in a sequence of B units there are r x l’s and r 2 0’s. 
Obviously r t + r 2 = It. Four oases may now be distinguished. 

A. Sequence begins with a 1 and ends with a 0. 

B. Sequence begins with a 1 and ends with a 1. 

C. Sequence begins with a 0 and ends with a 0. 

D. Sequence begins with a 0 and ends with a 1. 

A and I) must each have an even number of groups, say 21, in the sequence, while B and C 
will have an odd number, say 2t + 1. These cases give rise to the 2x2 tables shown in Table 2. 


Table 2 

Case A 
Present, trial 


Case B 
Present trial 


Preceding 

trial 


Preceding 

trial 



1 

0 

Total 

1 

*i-< 

t 


0 

t — 1 

7*2 — t 

7*2 — 1 

Total 

A— 1 

*•« 

R-l 


Case C 
Present trial 



1 

0 

Total 

1 

r x -t 

t 

? i 

0 

t 

r a -<—1 

r.-l 

Total 

A 

**-l 

R -1 


Preceding 

trial 



1 

0 

Total 

1 

0 

r x — t— 1 
t 

t 

r t -t 

r i~ 1 

Total 



R-l 


Case D 
Present trial 


Preceding 

trial 



1 

0 

Total 

1 

r x -t 

t-1 

A-l 

0 

t 


r 2 

Total 

♦i 

r i~ 1 

R-l 


It is clear that, although the data have been exhibited in the form of a 2 x 2 table, the under¬ 
lying criterion is still that of the number of groups in a sequence of r x l’s and r 2 0’s. Hence 
we are concerned with finding the power function of a test based on the number of groups 
in the sequence. 

3. Calculation op power function 

The principles of a Markoff chain have been discussed in detail, for example, by Frechet 
(1938), but the simple notation used here will be that of David (1947), If E s represents the 
sth event, which may either be a 1 or a 0, then under H 0 , the hypothesis of independence, 

P{P,,= 1} =p, P{E S = 0} = q, where f + q—l. 

Under the alternate hypothesis H v we have 

P{F X =1} =P, P{P 1 = 0} =0, 

P{E S = 11 E s -i = 1} = Pi> P{Es =0 | P 5 _! = 1} = 2i, 

P{E S = 11 E s _ x = 0} = p 2 , P{E S = 0 | E s _ x = 0 } = q 2 , 

P+Q=l, p 1 + q 1 = l and p 2 + g 2 = 1. 


where 







P. G. Moore 


307 


Making the simplifying assumption (David, 1947, p. 337) that if nothing is known about the 
s-1 trials preceding the sth trial, then P{1 7 S =1} = P and P{E S = 0} = Q, we have that 
Pz = P'hlQ or P = Pa/lPa + fh)- Using these relations and taking, say, the case of an even 
number of groups, it is possible with the aid of the formulae for P{2t groups 1 1 given 
by David (1947, p. 337) to evaluate a conditional power function. However, laborious 
calculations are necessary to evaluate this, even for a small sequence, so that some form of 
approximation is required. We will find, separately for an even (2f) or odd (24 + ]) number of 
groups, the forms of p(t\ and p(t | r-pr^Hf), where p( ) indicates a continuous prob¬ 

ability distribution taken as an approximation to the set of discrete probabilities denoted 
by P{ }• Then our ‘ conditional ’ critical region (o a will be such that 

P{Eeo) a \r 1 r 2 H a }^<x, 

where E is the sample point. Since our alternate hypothesis is that there is positive dependence 
in the sequence, we are interested in establishing the significance of low values of t. Hence 
we want to find the greatest value of t a such that 

P{t^t a \H 0 }^a. 

This is equivalent to finding the greatest value of t a satisfying 

51 PftlW^a. - (!) 

The power of the test, ir , will be given by 

n = P{E ew, | 

= P{i<i a |H ] } 

= ( 2 ) 

4=1 


4. Approximation to the distribution 


(a) Even number of groups 

We will consider in the first place the case of an even number of groups. It is known (David, 
1947, p. 335) that if there are an even number of groups (2<), then 


P{t\r 1 r 2 H 0 } = k T ^O t _ 1 r ^O l _ 1 , 

mvtHj = ■ (3) 

k and k' being constants such that SP{2f} = 1, where £ denotes summation over all possible 
values of i. From Wald & Wolfowitz (1940) we have that as r 1 tends to infinity, r remaining 
finite, the distribution in (3) under H 0 tends to normality so that 


W 


where r and cr 2 are the mean and variance of t. and can be derived from (3). Under Hi we 
shall have 

*<‘i ]“ p [' log ‘Sl] ; 



308 Randomness in a sequence of two alternatives 

on completing the square for t and making the integral over all possible values of t, equal to 
unity, this gives 

*‘ 1 ^ “ 7 ^ MP ['^{‘“( T+,rnos -Si;)F]' (6) 


Hence, t is again normally distributed with the same variance as before but with a new mean, 
t', where 

= t + ( 6) 

Pill X 1 

From equations (1) and (4) we can obtain the critical region and from (2) and (5) the power. 
The values of r and <r a may be obtained in a simple way following the method outlined for 
the complete distribution by Wald & Wolfowitz. We need the following three quantities: 


■ c e _ 

0=1 


C=1 c=1 


It will be seen that cc is the term independent of x in the expansion of (1 + a:)’"!- 1 (1 + l/x)^- 1 , 
and hence is equal to n+ra 1( while (/?—«) is the term independent of a: in the expansion 
of (?q- 1) (1 +x) r i~ 2 x(l + l/xf 3-1 , and hence is equal to (»q— l)o+ r 2 - 3 C' n _ 2 . The expression 

n-i 

S (o- l) ari “ 1 d' c _ 1 ra " 1 C' c _ 1 is the term independent of x in the expansion of 
0— 1 

fa -1) (»•» -1) (i+* y^ (i+i 

and is thus equal to (tq — 1) (r 2 - l ) r i+o- 4 C' r ^__ 2 , Using the identity c 2 = (c—l) a + 2c-l we 
can evaluate y. From these values we obtain for r and cr 2 

T = lihzL {1] 

r l + r 2 — 2 ’ ' ' 

a- m —farilifaiJi)!— (8) 

(r 1 + tq-2) 2 (r 1 + n ! -3) 


(b) Odd number of groups 

For the case of an odd number of groups the expressions concerned are not nearly as simple. 
We have, if there be 2 1+ 1 groups, 

P{t | r^Hf = i 0t ), | 

l and V being constants, such that the summation over all possible values of t is unity. We 
can deduce the mean (A) and variance (/i 2 ) of t under H 0 from (9) by similar methods to those 
employed above, giving 

> (u-i) 2 +fa-i) 2 v t , 

r\ + A-Tt-r, fi + r.-a’ 1 1 

u t = r r fa - 1) 2 fa r a -H -1) + fa ~ l) 2 (¥j~ h ~ ! ) .a 
1 2 (r 1 + J- 2 -2)(r 1 +r a -3)(r!+rl-r 1 -j' s ,) 


(ID 



P. G. Moore 309 

Again, following Wald & Wolfowitz, wa deduce that t will tend to be normally distributed 
r t tends to infinity, remaining finite, so that 

Under H ± we have that 


as ? 


y UJ.J.01U f 

- 7<4;,“ p [-i{ , -( £+ '‘ {io8 -M:)F_ ■ 


where £ and yf are the mean and variance of 

4> = ^O^G^+^G^C, 
Pi 22 

Using the same methods as before we obtain the expressions 

r P 0 

-(»’i-l) a + -(r.-l f 
Pi 2a 


S- 


r, r 


i'a 


fi + r*-2 




- v >2 a *2 
7 1'2 


^~(r 1 + r 2 -2)(r 1 + r 2 -3) 


_22 r l'2_ 


Pi V 2 


v , „ $ ;; 

-G(U-l)+.7»-2p2-l) 
Pi 22 


( 12 ) 


(13) 


-£ 2 . (14) 


5. Equivalence oe the tests 

We desire to show that in large samples the cases of odd and even numbers of groups tend to 
equivalence in the sense that the distribution of t in both cases becomes identical. As we have 
indicated that in both cases the distribution of t tends to normality, it is only necessary to 
show that the means and variances agree. Table 3 shows the scheme of means and variances 
that we have obtained. Hence we require to show, for large values of r 1 and r 2 , that (i) A and 
£ tend to r and (ii) y 2 and yf tend to <r 2 . We note that, in the particular case r x = r 2 = r, the 
expressions reduce to 

r = £(r+l), A = £ = |r, u 2 = (r- l) 2 /4(2r-3), y 2 = yf = r(r - 2)/4(2r -3). 


Table 3 



Mean of 4 

Variance of 4, 


Ho 

Ht 

H 0 and H 1 

Even no, of groups (24) 

T 

t' - T + tr 2 log„^l 
2aPi 

O' 2 

Odd no. of groups (24 + 1) 

A 

' 

A' = £+y? 

y 2 


When P = 0-5, £ = A and yf = y 2 whatever r x and r 2 . In general we see, from (7) and (10), 
that as r 1 and r 2 get large r and A tend to a common value. Similarly, from (8) and (11), 
cr 2 and y 2 tend to a common value. Under H v however, the expressions are a little more 




310 


Randomness in a sequence of two alternatives 

complicated. We see, from the forms of (7) and (IB), that t and £ both tend to r 1 r 2 /(r 1 + r 2 - 2). 
From (8) and (14), together with (13), we see that cr 2 and both tend to 


+ r 2 - 2) 2 fa + r 2 - 3). 


It is difficult to compare numerically the approximate with the true results as there are so 
many variables, but we will take some specific oases. 

(i) P = 0-5, whence £ = \and{i\ = [P. r x +r 2 = 200. 

Table 4 



T 

£ = A 

cr 2 

“Si¬ 

ll 

Oi i-l 

Si. 

r 1 = 100 

r a = 100 

50-5 

50 

12-4378 

12-4365 

r x = 105 

r a = 95 

50-1206 

49-8763 

12-3744 

12-3719 

^ = 110 

r a = 90 

49-7437 

49-5050 

12-1853 

12-1915 

r r =116 

r a = 85 

49-1156 

48-8859 

11-8733 

11-8854 


(ii) P + 0-5, fj+rj = 200. 


Table 5 


P = 0-6 

II 

o 

' 

r x = 120, ?' a — 80 

r x = 80, r 2 = 120 

t=48-2362 A = 48-0188 

tr 2 = 11-4433 11-4723 

Pi 

T a 

£ 

.A 

Pi 

Pi 

£ 

pi 

0-7 

0-45 

48-0257 

11-4679 

0-5 

0-3333 

48-0238 

11-4689 

0-75 

0-375 

48-0280 

11-4701 

0-6 

0-2667 

48-0273 

11-4696 

0-8 

0-3 

48-0299 

11-4709 

0-7 

0-2000 

48-0299 

11-4708 

0-85 

0-225 

48-0314 

11-4766 

0-8 

0-1333 

48-0319 

11-4725 

0-9 

0-15 

48-0327 

11-4770 

0-9 

0-0667 

48-0335 

11-4738 


These tables show that the differences involved do not appear to be very large when the 
sequence is of some length. 

6. The total-group test 

We have obtained in the preceding sections the power function for even and odd numbers of 
groups separately in a sequence of r 1 1 ’s and r 2 0 ! s. We will now combine these results to form 
what David (1947) has termed the conditional power function. That is, we will find the dis¬ 
tribution of T, the total number of groups, in a sequence of r x 1 ’s and r 2 0’s under II r We have 
shown that for an even (2 1) or odd (2t+ 1) number of groups t tends to be normally distributed 
with mean t' and variance cr 2 . Now in the limit, i.e. as the sequence gets large, the probabilities 
of getting an even or odd number of groups will each tend to a half. This can be seen if we 





P. Gr. Moore 311 

consider the discrete set of probabilities graduated by a normal curve. Hence we can combine 
the two distributions to give a new mean, 6, and variance, r 2 , of the distribution of T, 

6 = 2(2 1) P{even t} + 2(2f + 1) P{odd*}, 

where the summation is taken over all possible values of t. In the limit we will get 

0 = 2t' + I 

Similarly, we know that a 9 - = 22 (< 2 ) P{even t) - r' 2 

= 22(P) P{odd i} - r' 2 . 

Thus v 2 = 2(2f) a P{eveni} + 2(2i + l) 2 P{oddf}- (2r' + £) a 

= 2(cr 2 + t' 2 ) + 2(cr 2 -f t' 2 ) + 2r' + \ - 4r' 2 - 2r' - J 
= 4cr 2 + 


Under H 0 , however, we may calculate the exact value of the mean and variance of T following 
the method used in § 4 and given by Wald & Wolfowitz (1940, p. 151). As the sequence gets 
large T may be assumed to be normally distributed with 

.. 2r,fn 

Mean = ; —^ + 1, (15) 

»i + r 2 v ' 


Variance = 


2r 1 r 2 (2r 1 r a -r 1 -r 2 ) 

(H+r 2 ) 2 


(16) 


7. The hypergeometric distribution 
If we have 2 1 groups, then from (3), 

P(2t | r^H,} = ' 

t 

an expression proportional to a term in a hypergeometric series and also equal to the chance 
of the partition in a 2 x 2 table with fixed margins (Table 6). Hence, the mean of (tq - f) is 

Table 6 



t -i 

fl-I 

t - i 

r 2 -« 


* 1-1 

*2-1 " 

Ci+r 2 -2 


(?\ ~ 1 ) 2 /( ? i + r 2 - 2) or, mean t=(r 1 r it — 1 )f(r l + r 2 — 2), and the variance of (r, — t) and therefore 
of t is (»q — l) a (r 2 - l) a /(r 1 + r 2 —2) 2 (r 1 + r a —3), results wlrich agree with those in equations 
(7) and (8). These are analogous to the expressions given by Stevens (1939) for the number of 
groups of l’s and 2’s in a circular sequence, when there must be an even number of groups. 
For the case of an odd number of groups we may take the two terms of (9) separately, the 




312 


Randomness in a sequence of two alternatives 

first term, giving an expression proportional to the probability of 2t +1 groups in ease C and 
the second for case B. These expressions are also proportional to a term in a different hyper- 
geometric series and equal to the chances of the partitions in 2 x 2. tables with fixed margins 
(Tables 6 a,b). 


Table 6 a. Case B 


Table 6 b. Case C 


r x — t— 1 

4-1 

*i-2 

t 

r 2 -4 


n-1 

r t - 1 

r i + r 8 — 2 


?’i-i 

t 


4-1 

r 2 -t- 1 

* a -2 

*1-1 

r,-l 

’•i + r 2 -2 


The conditional power function obtained in the preceding sections is similar in many 
respects to that obtained by Patnaik (1948) in testing for the difference between two pro¬ 
portions when the data are put in the form of a 2 x 2 table. His conditional power function 
(pp. 162-3 of his paper) is the same as that derived here for t, remembering that he is using 
‘a’ in the notation of our Table 1 as his criterion. 

The usual procedure in the past to test for randomness in a sequence of two alternatives 
has been to apply the y a test to Table 1. It is clear, however, that the underlying basis is 
the group test. The procedure now suggested is that we deduce T from Table 1. Then our test 
would be: 

(i) Find r v r 2 and T from the 2x2 table formed. 

(ii) Calculate mean and variance from (15) and (16). 

(iii) From the normal curve tables find the probability of getting the observed value of 
T or less under 2? 0 . 

(iv) Reject the hypothesis of independence if this probability is less than some pre¬ 
assigned level a. 

In large sequences the two tests may be shown to tend to equivalence, but how far the 
difference between the tests is important for small samples has not yet been fully investigated, 
although in a number of cases comparisons have been made. As an example, Table 7 gives the 
probabilities of having T n groups or less in a sequence of 40 using the different methods of 
evaluation. For the ‘normal’ group test, equations (15) and (16) have been utilized with 
a continuity correction and the normal curve tables. For y 2 , the probability in brackets is 
obtained with a continuity correction, the other probability being obtained without it. 
The last column gives the exact chances calculated from equations (3) and (9). 

It can be seen that the exact and approximate distribution of T are very similar, but with 
y 2 the -use of a continuity correction appears to make all the probabilities too high. This 
discrepancy was noted in a number of other cases. It should also be noted that when T is 
odd the probability level obtained from y 2 is dependent on whether we have case B or case C. 
The differences between the two might be important in cases where the probabilities were 
near the preassigned significance level. 





P. G. Moore 


313 


Table 7. 


Sequence of 40. 


r t = 25, r, = 15 


T 0 

‘Normal’ 


X 2 


Exact 

groups 

Cases A and D 

Case B 

Case C 

groups 

12 

0-0066 

0-0063 

(0-0163) 


— 

0-0065 

13 

(KOI 62 

-- 

0-0144 

(0-0324) 

0-0193 

(0-0426) 

0-0164 

14 

0-0362 

0-0363 

(0-0733) 

“ 

-- 

0-0366 

15 

0-0728 


0-0656 

(0-1208) 

0-0847 

(0-1525) 

0-0738 

16 

0-1330 

0-1339 

(0-2220) 

— 

— 

0-1329 

17 

0-2206 


0-2025 

(0-3105) 

0-2489 

(0-3707) 

0-2215 


The table gives P{T=S !F 0 }. 


8. The power eunction; numerical comparisons 

All the first set of calculations are based on a sequence of 40 units. The value of P used 
throughout is 0-5, the significance of which is mentioned in the next section and, as we are 
concerned with the case of positive dependence, f x is taken greater than P. 

Table 8 compares the values of the power function of the group test for different com¬ 
positions of a sequence of 40 units. These values were calculated from the normal curve 
approximation, using a continuity correction. Allowing for differences in the value of a, owing 

Table 8. Power functions for sequences of 40 (r 2 = 40—r 1 ) 


Pi 

Value of r x 

20 

21 

22 

23 

24 

26 

26 

27 

0-5* 

0-0274 

0-0281 

0*0315 

0-0348 

0-0424 

0-0536 


0-0223 

0-55 


0-0995 

0-1068 

I 

1 

■ 

0-0541 

1 

0-6 

0-2566 

0-2591 

0-2688 


it*”! 


0-1513 

0-1807 

0-65 


0-5076 

0-5150 

0-5283 

0-5510 

0-5748 

0-3335 

0-3683 

0-7 


0-7674 

0-7694 

0-7783 

0-7913 


0-5838 

0-6103 

0-76 

0-9346 

0-9348 

0-9340 

0-9372 



0-8207 

aiiMiMl 

0-8 

HiS?! 1 iH 

0-9920 

0-9915 

0-9920 

iiM * 1 $ U 


0-9679 

0-9586 

0-85 

Kf!"" 9 

0-9998 

0-9997 

0-9997 

0-9997 

0-9997 • 

0-9966 

0-9963 

0-9 

1-0 

1-0 

1-0 

1-0 



1-0 

0-9999 

K 

0-1254 

0-1194 

0-1031 

0-0807 

0-0672 



0-0109 


* When = 0-5, the value of the power function is the value of the first kind of error, usually denoted 




























314 


Randomness in a sequence of two alternatives 

to the discontinuity, the values are very consistent, and the test appears to be just as powerful 
over the limited range considered, whether or not the sequence has equal numbers of l’s 
and 0’s. k, given at the bottom of the table, is the probability, under the hypothesis II 0 , that in 
a sequence of 40 units with P - 0-3 there will be just r x Ts. We may obtain the overall power 
function by considering a sequence of fixed length, say 40. Then, assuming that r x is normally 
distributed with mean UP and variance RPQ(l + 8)1(1 — 5), where § = p 1 ~-p 2 and R = r x +r 2 
(Uspensky, 1937, p. 301 following Markoff), we can use Table 8, extended slightly beyond 
iq = 27, to get tins overall power function. We do this by weighting the tabulated powers by 
the probability of obtaining that division of and r 2 and then summing for all possible 
divisions of R. This would tell us that if we drew samples of 40 from a population for which 
P = 0-5 and 8 — 8 0 , then, using this group test, the chance of detecting that 8 4=0 is the 
power corresponding to r p 1 = 0-5-t-^<5 0 . These powers are given in Table 8 alongside those 
for a group test devised by Bateman. 


Table 9. Comparison of power functions r x + r 2 = 40 


Test used 

Value of p ± 

0-5 

0-66 

0-6 

0-66 

0-7 

0-75 

0-8 

0-86 

0-9 

T, as described above 

0-0329 

0-1080 


0-6041 


0-9183 

0-9864 

0-9990 

1-0000 

Bateman’s 

0-0266 


0-2484 

0-4867 

0-7397 

0-9138 

0-9846 

0-9989 

1-0000 


Bateman (19486) has shown that when P = 0-5 the total number of groups ( T) is distributed 
as the binomial (p x + gq)® -1 . Hence we may compare this test with the power function just 
computed. The slight differences are due to the fact that whereas Bateman’s test has the same 
critical region, whatever r v our test has a critical region depending on r 1 . The agreement is, 
however, very good. But when P + 0-5 Bateman’s formula no longer applies and our 
approximate test must be used. 

The difficulty in getting from Table 8 to Table 9 is that it entails very long calculations. 
As an approximation to the overall power we may follow a method used by Patnaik (1948). 
Briefly, we replace r x and r 2 by their expected values of BP and P(1 — P). The adequacy of 
this approximation can be seen by comparing the first column in Table 8 with Table 9. The 
agreement is very good. 

Using the 0-05 nominal significance level for H 0 , we can find the minimum length of 
sequence such that the chance is 0-9 of rejecting randomness by the group test, when p x has 
values greater than 0-5. Table 10 gives these minimum lengths of sequence. They are obtained 
in the same way as before, successive values of r x and r 2 being taken until the power just 
becomes 0-9. The table shows that if we are interested in picking out the degree of dependence 
of a certain magnitude the length of sequence necessary to do this with a given power varies 
considerably according to the value of P. In practice, however, it is possible that we would 
want the test to be equally powerful for the same values of some function of S and P, and not 
just for equal values of 8 alone. 










P. G. Moore 
T able 10 


315 





Value of A = p 1 -P = fi(l 

- P) 



0-05 

0-10 

0-15 

0-20 

0-25 

0-30 

0-35 

0-40 

P = 0-3 

1800 

466 

210 

126 

92 

70 

46 

34 

P = 0-4 

1240 

320 

120 

76 

48 

38 

24 


P = 0'5 

904 

220 

96 

56 

34 

22 

_ 

_ 

■ p = 0-6 

540 

120 

56 

38 

20 

_ _ _ 

_ 

_ 

P = 0-7 

124 

63 

34 

23 

— 

— 

— 

— 


9. Application op results 

In a control chart an examination of runs of values above and below the median is often of 
assistance, in eliminating trend effects. Mosteller (1941) uses the length of the runs, while 
Shewhart (cited by Swed & Eisenhart, 1943) suggests the use of the technique we have 
discussed employing the number of runs. In cases of short sequences the power of the 
group test is low. But for longer sequences, say 40 or over, the technique provides a useful 
method for picking out clustering effects. It is assumed, from previous results, that the 
median value is known, whence it could be used to make P = 0-5. The foregoing test has been 
concerned solely with the case of positive dependence, i.e. P, and if it were desired to 
test for dependence in both directions a two-ended test would have to be made on similar 
lines. 

Table 11 

Present month 


Preceding 

month 



Wet 

Dry 

Total 

Wet 

542 

582 

1124 

Dry 

582 

759 

1341 

Total 

1124 

1341 

2465 


Cochran (1938), in considering the distribution of wet and dry months, gives a 2 x 2 table. 
A wet month is defined as a month when rainfall is greater than the average for that month 
over the period 1881-1915. Hence we have a P of approximately O'5, The alternative to 
randomness in the order in which wet and dry months follow each other is that there is a 
persistence of one kind of weather giving us the basic conditions for a Markoff chain. Since 
we are considering either case B or C of §2, the value of T is 2 x 682+1 = 1165. Further, 
assuming that we have case C,* so that r x = 1124andr a = 1342,we get that moan T = 1224-36 
ands.E.of T = 24'63. Hence we refer the ratio (1224'36 — 1124)/24'63 = 2-41 to the normal 
curve tables and find that the probability of getting a value of T less than or equal to 1165, 

* To differentiate between cases B and C we would need to know whether the sequence started with 
a wet or a dry month. 




316 


Randomness in a sequence of two alternatives 

under the hypothesis of independence, is 0-0080 or odds of 125 to 1 against independence, 
If a continuity correction is used the odds are 119 to 1 against independence. Thus we would 
reject the hypothesis H 0 in favour of H v the hypothesis that there is positive dependence, 
in the sense that wet months tend to follow wet months and similarly for dry months.' Cochran 
found the proportion of months, following a wet month, which were wet and also the pro¬ 
portion following a dry month which were wet. These proportions are 0-4822 and 0-4340 
respectively. The s.b. of the difference is, from the binomial theorem, 

V{iHi-2ili(xi^4+rafT)} = ± 0-02013, 

and from normal curve tables we get odds of 120 to 1 against the observed difference arising 
by chance in a sample from a population for which these two proportions are equal. This 
test is equivalent to calculating ^/y 2 , and if a continuity correction is employed the odds 
become 108 to 1 against independence. Thus using either test, in this case of a long sequence, 
we would reject the hypothesis of independence, although the difference due to the use of 
a continuity correction for y 2 , which was noticed for short sequences, is still apparent. 

10. Summary 

We have obtained an approximate formula which makes it easy to find the power of the 
group test for a sequence of two alternatives when H {) is the hypothesis that there is ran¬ 
domness in the sequence and whilst for R x the dependence follows a simple Markoff chain, 
This formula has been used to find the minimum lengths of sequence necessary to pick out 
various degrees of dependence with a given power. Finally, the y 2 test for independence in 
a 2 x 2 table is shown to lead to the same test for long sequences. 

The author wishes to thank Dr F. N. David and Professor E. S. Pearson for their 
guidance during the preparation of this paper. 

REFERENCES 

Bateman, G. I. (1948a). Biometrika, 35, 97. 

Bateman, G. I. (19486). Unpublished thesis. 

Coohean, W. G. (1938). Quart. J.R. Met. Soc. 64, 631, 

David, E. N. (1947). Biometrika, 34, 336, 

FnliiCHET, M. (1938). Bacherches iMoriques modernes sur Je calcul des probability. Book 2. 

Mosteuusr, E. (1941). Am. Math. Statist. 12, 228. 

Patnaik, P. B. (1948). Biometrika, 35, 157. 

Stevens, W. L. (1939). Ann. Eugen., Lord., 9, 10. 

Swed, F. S. & Eisenhart, O. (1943). Ann. Math. Statist. 14, 06. 

UsrENSKY, J. V. (1937). Introduction to Mathematical Probability. McGraw Hill Book Co. 

Waxd, A. & Woejowitz, J, (1940). Ann. Math. Statist. 11, 147. 



[ 317 ] 


A GENERAL DISTRIBUTION THEORY EOR A CLASS OF 
LIKELIHOOD CRITERIA 

By G. B. P. BOX 

Imperial Chemical Industries , Dyestuffs Division Headquarters , Blackley, Manchester 

1. Introduction 

The likelihood ratio method of Neyman & Pearson (1928) has been used by many different 
workers for the derivation of criteria appropriate for the testing of a large variety of hypo¬ 
theses. Plackett (1946), in a recent survey of literature on testing the equality of variances 
and covariances, lists, on this problem alone, criteria for the testing of no less than thirty-one 
hypotheses investigated at different times by workers in this field. Most of the criteria either 
have been or can be arrived at by the likelihood ratio method. In the preface to his survey 
Plackett says: ‘ Generally speaking the difficulties in testing such hypotheses lie not so much 
in deriving criteria—but in finding their exact distributions when the hypotheses are true 
and determining the best critical region to adopt. 5 

Although in many cases the exact distributions cannpt be obtained in a form which is of 
practical use, it is usually possible to obtain the moments, and these may be used to obtain 
approximations. In some cases, for instance, a suitable power of the likelihood statistic has 
been found to be distributed approximately in the type ! form, and good approximations 
have been obtained by equating the moments of the likelihood statistic to this curve. For 
example, in the original paper on the L 1 test for homogeneity of variances, Neyman & 
Pearson (1931) suggested that the distribution could be approximately represented in this 
way, and later Bishop & Nair (1939) showed that the significance points obtained by Nayer 
(1936), using this method, were in excellent agreement with the true values. The fitting of 
the type I curve is simple once the moments are obtained, but these moments, being the 
products of T-functions, are usually rather troublesome to calculate. To overcome this 
difficulty, Bishop (1939), working on the distribution of the multivariate equivalent of the 
L ! test (the test for constancy of variances and covariances in h p-variate samples), 
derived empirical expressions for the parameters of the appropriate type I curve, thus 
avoiding the troublesome intermediate step of calculating moments. Bishop mentions that 
Nair succeeded in finding similar expressions on a theoretical basis, and Tukey & Wilks (1946) 
give a more general theoretical method to find approximations of this kind. 

A different line of approach was adopted by Bartlett (1937). Neyman & Pearson had 
pointed out in their original paper that, if N' is the total sample size, — N' log e L x would be 
asymptotically distributed as x*. From considerations of sufficiency Bartlett obtained what 
was in effect a modified form (which, following Hartley & Pearson (1946), tve shall call M) 
of this logarithmic statistic. From the moments of the modified likelihood statistic he was 
able to develop a scale factor C, which was related to the effective sizes of samples and which 
approached the value unity as the sample sizes became large. The distribution of M/C was 
then very well represented by % 2 even when the samples were small. Bartlett later (1938) 
used the same method to obtain an approximation for the test of significance in multivariate 
analysis. In 1940 Hartley, starting from the moments of the modified likelihood statistic, 



318 


A general distribution theory for a class of likelihood criteria 

obtained an asymptotic series of y 2 integrals for the logarithmic statistic M which agreed 
very closely with the exact distribution. In 1941 Wald & Brookner, investigating an entirely 
different problem, the distribution of Wilks’s statistic for testing independence of k sets of 
variates, again starting from the moments of the likelihood statistic A, eventually obtained 
an expression for the distribution of a logarithmic statistic (a negative multiple of log e A) 
in the form of an asymptotic x 2 series. This was later modified by Rao (1948) in the important 
special case of two groups, when ,)t corresponds to the test of significance in multivariate 
analysis previously referred to. Neither Wald & Brookner nor Rao investigated the accuracy 
of these series. 

It is possible therefore to distinguish two definite lines of approach, which have been used 
in certain eases where the moments of the likelihood criteria are known but the exact dis¬ 
tributions are not. On the one hand the moments have been used to fit the Pearson-type curve, 
This usually gives an adequate approximation, but owing to the amount of labour involved 
in the calculation of the moments it would not be attractive for routine significance testing 
unless methods such as Bishop’s could be used to obtain the parameters of the fitted curve 
directly, or the results from the method could be tabled. On the other hand, the general 
expression for the moments of the likelihood statistic has b,een used in certain cases to obtain 
for the distribution of the logarithmic statistic M, a y 2 approximation and an asymptotic 
X 3 series. It will be the object of the present paper to investigate in some detail this second 
line of attack. 

The method will be investigated in particular for two general criteria: 

(1) The test of constancy of variance and covariance of 1c sets of p-variate samples. This 
includes, as an important special ease when p = 1, the test for constancy of variance in h 
samples. 

(2) Wilks’s test for the independence of h sets of residuals, the 1th set having p ; variates. 
When k — 2 this corresponds to the test of significance used in multivariate regression and 
analysis of variance and covariance, and when k = 2 and p x or p a is unity, it gives the 
corresponding well-known univariate tests. In the latter case, of course, the exact distri¬ 
butions are known. 

We shall refer to these two criteria as generalized tests for homoscedasticity and independ¬ 
ence, respectively. The assumption of normality or multinormality for the distributions of 
the original observations will be made throughout this paper. 

Taking for our test function M, a negative multiple of the natural logarithm of the like¬ 
lihood statistic (or some modification of it), we shall obtain in each case, 

(а) a series solution which, we shall demonstrate, agrees very closely with the exact 
distribution, 

(б) an approximate solution using a single y 2 distribution, 

(c) a rather better approximation using a single F distribution. 

The accuracy of the various methods and the relation of the results to those of other 
workers will be discussed. 


2. The generalized test of homoscedasticity 

The univariate statistic. The L l statistic of Neyman & Pearson for testing the homogeneity 
of a set of variances, takes the form of the ratio of a weighted geometric mean of variances 
to a weighted arithmetic mean, where the weights are the sample numbers. Welch (1936, 



G. E. P. Box 


319 


1936) generalized the test to cover the case when residuals from a fitted regression equation 
were tested for homoscedasticity, and derived the moments for a modified criterion in which 
the weights could have any values whatever. In 1936 Nayer tabled the approximate signi¬ 
ficance points for L ± in the cases of equal sample numbers by fitting type I curves to the 
distributions by the method of moments, as suggested in the original memoir by Neyman & 
Pearson. 

The statistic proposed by Bartlett (1937) which we shall call M is given by 

M = N log e S — 2 V| l°ge s l> 

l 

where s = (hv^fi/N, 


and s t is the usual unbiased estimate of the variance in the Zth. group, l = 1,2based on 
sums of squares having v t degrees of freedom, and N = Yv t . It was later shown (Brown, 1939; 
Pitman, 1939; Bishop & Nair, 1939) that this criterion is unbiased in the sense used by Neyman 
& Pearson (1936, 1938). Nair (1939) derived a series solution for the distribution of the like¬ 
lihood statistic in the case of equal sample numbers; his solution is very involved, but has 
been used as a standard to check approximations. Bishop & Nair (1939) used this series to 
check the accuracy of the type I approximation used in Nayer’s table. They also checked the 
Bartlett (1937) approximation and found that both methods were fairly good except when the 
degrees of freedom were small. In the case of unequal samples, however, Nayer’s tables were 
not available, and in view of the labour involved in the type I fit, the method of Bartlett’s 
was preferred. 

Hartley’s (1940) asymptotic series depended to the degree of approximation used, on two 
parameters o 1 and c a which varied with the effective sample size and relative composition of 
the groups 


Cl 


y 1 1 , 

2 j 1 ““ trr } Cq 

v t N 8 



The first is related to Bartlett’s scale factor G, in fact 


0 = 1 + 


Cl 

3 (&—!)’ 


Tables were afterwards computed by Thompson & Merrington (1946) from Hartley’s for¬ 
mula, and comparisons were made with the values calculated by Bishop & Nair. 

The multivariate statistic. In the multivariate case Wilks (1932) derived the likelihood ratio 
test and obtained the moments of the criterion, which is a generalized form of that used in 
the univariate test, the determinants of estimated variances and covariances replacing the 
variances. Bishop (1939) took as his criterion l x , the ljN'th power of the likelihood statistic, 
N' being the total number of observations. He gave reasons for believing that this criterion 
could be approximately represented by a type I curve 

p(l t ) = constant ^^(l —(1) 

by choosing the value of m 1 and m 2 so that the first two moments of the Pearson curve agreed 
with those of the criterion. His arguments were supported by the agreement found in a num ber 
of trials between the higher moments of the fitted type I curve and those of the criterion. 
Only in the case of two groups and either one or two variates was it possible to obtain a check 
against the exact distribution, but in these cases the agreement was very good. Unfortunately 
the labour involved in the calculation of the first two moments of the criterion was too 
Biametrika 36 S1 



320 A general distribution theory for a class of likelihood criteria 

great to allow this method to be recommended for routine use. Bishop therefore proceeded 
as follows: 

[a) Bor the case of equal sample sizes he obtained, empirically, expressions for m 1 and m, 
in terms of the number of observations n in each group, the number of variates p, and the 
number of groups k 

m 1 = k{n-p) -0-01 (7s— 1) (90 — 39p-t9p z ), 
m 2 = 0-25(7c— l)p(p+1). 

(b) For unequal sample sizes he proposed approximating to -2iV'log e Z 1 by means of 
a x 2 distribution using a scale factor G in a similar way to that adopted by Bartlett in the 
univariate case. 

He showed that - 2N'logJ 1 is approximately distributed as C?y a , where 

= l + 4r £ S {i 2 /2^+i/K-i) + V3(%-i) 2 } 

JLi=i t-i 



-£{(*+<-1 )7 2 ^' + (ft+• - l)l(N' - Tc + 1 - i)+N'lW - k +1 - »)*} 

i =1 



n t is the number of observations in the Zth group, n L = N' and yf is distributed with 

i 

f = j(k— \)p(p-\- 1) degrees of freedom. 

We shall refer to these methods as Bishop’s methods ( a ) and (6). Bishop remarks that the 
scale factor G is rather troublesome to calculate unless % = n. George (1945) was able to 
evaluate the exact distribution in a number of simple cases. She used her results to test the 
accuracy of Bishop’s approximations and found, in the cases she considered, that (6) was 
superior to (a). 

Plackett (1947) suggested that in view of the unsatisfactory position with regard to the 
distribution of this criterion that it might be better to abandon it in favour of an alternative 
test derived by him which had the advantage that at least when p = 1 or 2, and for certain 
other special cases, the exact distribution was known. Plackett’s test, however, has the 
disadvantages that the results depend on the particular arrangement of observations chosen, 
and that the samples must be of equal sizes. 


2-1. The present approach 


Suppose s ijt is the usual unbiased estimate of the variance or covariance A^ 1 between the 
ith and )th variable in the 1th sample based on sums of squares and products having v : degrees 
of freedom, and suppose there are k such samples and is the average variance or covariance 

Jn, where N = 2Vj- We take as our criterion a generalized form of Bartlett’s 


criterion 


M = N log, | s i} | -S (v*log e | s ijt |) 

= -Wlog e ii, where Y[ 

i=i l| % I 


W 

( 6 ) 


When the degrees of freedom are equal, M and L[ are related with Bishop’s l x as follows: 

M = -2Nlog 6 h and L' x = l\. (6) 

When the degrees of freedom are unequal, L' x will differ from the likelihood statistic in 
weighting. When p = 1, M is the criterion derived by Bartlett and later used by Hartley. 



321 


G. E. P. Box 


We proceed to obtain the moments of L[ when the null hypothesis is true. If c iit are the 
sums of squares and products based on v L degrees of freedom corresponding to the s^, we have 
c iji ~ v i 8 W' °« = ^ '%> 80 c v - S <kji- The joint probability density of the c ijt for the 1th 

sample is given by the distribution discovered by Wishart (1928): 

P(cw,<h2L >-g = K( Vl ) (7) 

where r(^j | A, | ~i"i, (8) 

and A t is the matrix of the A ijb the inverse of the matrix of the A til . 

When the null hypothesis is true, A t is the same for each of the samples, A t = A, 
1= 1,2and the pth moment of | c i3 -1 is 


Jn [- K ( v i) I c w \ iiVl v 11 1 c v \ alk exp (-IS A {J c i: 

it is also given by 


3/C-ti-i do-tn-t •.. do, 


'VP?c> 


Jz(hT) | C i:l | Cij |a exp (- ^S AyCyjdc L1 dc M ...dc p 


(9) 


( 10 ) 


Writing v t (l + 2h) for v L on both sides of the identity and then taking g = -Nh and in¬ 
tegrating over the whole space for which the matrices of the c i} , c y , are positive definite, we 

have r ,c n % | V''T A TO 1 +2A)}"| K i N ( 1+2^)} 

^|IJi=iL K{v t ) J 


n 

Ll=l 


K(N) 


That is i(L') Nh = ^(e~ M ) h = + 2 ^)} X [” K{vi) INy v i h 

(l) ( } K(N) i?xUwi + 2A)}U/ J 


( 11 ) 

( 12 ) 


= A r m-j)} 

z=i Wj J 


3 Lfft(tf(l + 2ft)-j)} 


n 

z=i 


r{|(v,(i + 2fe)-j)}- 

nm-j)} 


=]• 


(13) 


We have first proved equation (13) as an analytic identity for real hr, it will, however, be 
generally valid in the range where the functions are analytic. We can thus obtain an expression 
for the characteristic function of pM, where p is a constant ^ 1 at our choice, by replacing 
Ji by —itp in the above expression. The reason for introducing the constant p will appear 
later. Further, if we write N = vk (i.e. v is the average of the degrees of freedom) and define 
new quantities p, p h /?, fit by the relations 

l^ = pv u p = pv, n = /t + /7, v t = pt+fti, (14) 


we obtain the characteristic function of pM in the form 

am _ A T * rawi-^)+A-j}] i 

liA W /ioLm{¥(i-2^) + ¥-i}]“i m(ft+A-i)] 

and taking logarithms we have the cumulant generating function in the form 

x nt) = g(t)-g(o), 

where g{t) = - 2 ityiP log (—) + ("s {log T[i{/t ; (l - 2it) + ^-j}]} 

z=i Mhl i=»l>i 

-io g r[HV(i-2it)+¥-i}]]. (I’) 


(15) 


(16) 


21-2 



322 


A general distribution theory for a class of likelihood criteria 

and g( 0) is a constant independent of t obtained by putting t = 0 in the above expression. 
Now Barnes (1899) was able to generalize Stirling’s theorem, and he showed that for all * 
real or complex, log Y{x + h) may be expanded in an asymptotic series: 

n 7J fft\ 

log r(g + V) = log V(2ar) + (a ■+ h - }) log x - x - ^ (-1 ~ r + K+i( x )> (IB) 

where R m ( x ) is a remainder term such that | R m (x) | < , 6 is some constant independent of 

r I 

x and B r (h) is the Bernoulli polynomial of degree r and order unity defined by 


T6‘ 


i hr oo ffr 

Expanding each of the T-functions in this manner we obtain 

'F(f) = 0-flf<0)-i(ft-l)3>(3> + l)Iog(l-S6ft)+ i 5(1-2 itr + R n+1 (ii, t), 

r-1 W 

where Q does not contain t and is given by 


Q = 


p(.k-l ), p 


z log 277+ 2 [J> ~ nr) log f ~ [ kv ' 12 ir} lo %'?] ’ 


(.-i r+Mw y 

r r(r +1) jh 


Pi~ 3 


* Bf+l \ 2 

S 

Li=i 


R 


r+l 1 


¥-3 

2 


A 




and i? n+1 (/i, t) is defined by (17) and (18). 
From (20) we have 


<&(<) = m-rityv S 2(1-2ft)- +K«(/M) 


( 19 ) 

( 20 ) 

( 21 ) 

( 22 ) 

( 23 ) 


where If = exp{Q—g(Q)},f = \[k— l)p(p+ 1), and a v is the coefficient of/T in the expansion 
of exp | E OLfjrij . 

The probability density function of pM is then given by 


p(pM) = ^ f + e ~ itpM $(<) dt 

Aft J - a o 

= El]%p(3&»)+K + iM' 

The probability that a given value M () of the criterion is exceeded is therefore 


( 24 ) 

( 26 ) 


. where 


Pr. {M > M 0 } = K S %P f+2v + K +1 (p, t); 

U=0 P 

Pf+zv I P(Xf+zv) dyf > 

J pMa 


( 26 ) 

( 27 ) 




pM„ 


E 5(1-2it)-ttf+w»(e8M-i(frO_ l)iid(pjf) 

li—O p® 


and p(%/ +2lJ ) is the probability density function of the y 3 distribution with/+ 2n degrees of 
freedom. For all sufficiently large values of y,R'f +x (p,,t) tends to zero and the required 



G. E. P. Box 


323 


probability will be given with sufficient accuracy by taking.a suitable number of terms of 
the series in (26). Putting t = 0 in (20) we have 

Q - £7(0) = - |Jc (oc r lff) + B n+1 (p, 0)j. (28) 

It is found in practice that by taking a few terms of the series (even in difficult cases usually 
not more than six), exp (— 2a,.//*,.) is so close to exp { Q - g( 0)} = K that direct calculation of 
that constant is unnecessary. 

If we expand Q — g( 0) as well as g(t) we obtain instead of (20) 

- i/log (l-2i«)+ S|5(l- 2 «r-l]+^ B+1 (/».*)--B»+i(/‘,0). (29) 

Proceeding as before we obtain 

Pr M 0 ) = P f + ai (P /+2 - Pf) l//i 

+ |“a(-^/+4 _ ^ [Pf+i ~ 2 P/+ 2 +jf/)| 1/p 2 + etc. (30) 

Thus we may use a suitable number of terms of either of the series given in (26) or (30) to 
obtain the probability of the criterion exceeding a given value. Formula (26) has been used 
in this paper, (30) being rather unwieldy if a large number of terms have to be taken, 

It should be noted that in the derivation we have used two series, first the asymptotic 
series for the expansion of the T-functions, and then the exponential series. In any particular 
case we have to decide how many terms we need in the asymptotic series to give a sufficiently 
ciose representation of the function and then how many terms we shall use in the exponential 
series. In those cases investigated here, six terms of the exponential series have nearly 
always proved adequate as judged by the closeness of agreement between Sa r //t r and g(0) - Q 
independently calculated; often fewer terms were necessary, terms in higher powers of 1 //t 
having negligible effect. In the case of the exponential series the number of terms necessary 

n 

to represent adequately exp 2 a r //i r is usually not more than eight, but has sometimes been 

r 

as many as fourteen. It is mainly in order to keep the number of terms required at this stage 
within manageable limits that the scale factor p is introduced, since by suitable choice of 
this constant, the values of the a’s can be kept small and the number of terms required in the 
exponential series is consequently less. We see that in effect we are fitting a yf series to the 
statistic M by arranging that, to the order of accuracy chosen in the asymptotic series, the 
series will have all its cumulants identical with those of M. Before we consider the problem 
of choosing a suitable value for p, we shall derive an expression for the a’s and hence the a’s 
in a form which is more suitable for computation. 


2-2. Determination of the a’s in a, form suitable fen' calculation 
From the well-known properties of Bernoulli polynomials (see, for example, Milne- 
Thomson, 1933), we may write the symbolic equality 

B r (x+y) = (B + x+yY, (31) 

where, after expansion, each index of B is to be replaced by the corresponding suffix. Whence 



(32) 



324 


A general distribution theory for a class of likelihood criteria 

Also if P(x) is a polynomial in x and P'(x) is the differential coefficient of P(x) with respect 
box, ^ 

.(33) 


P ZP’(x) = P(B'+p)~P(B). 

x =>0 


Thus if 


and 


P(j) = 


_2 

s + 


iM?) 


+ constant, 


p-i l — ) 

L B, J 

l-o 


- “ 2 Fr 

4+1 


B+p j 




If we denote the expression in square brackets by S s , we obtain from (32) and (35) 


2>—1 k 

2 E B r+1 

j—0 2=1 

and in a similar way we find 


'Pi-3 




J !=1«=0\ s Js+\\2] 


(r+l\ k r + 1 ~ s (/3V+ 1 - 


s J s+l \2 


Whence from (22) we obtain u r as a polynomial of degree r in /? 

where 


' r{r+l)(r+2) a i 

A, = KJk 


5»±B 


3+1 


B+P] 

B 1 B \ 

2 j 

-M"») 


and 


1 * 


k s% 


(34) 

(35) 

(36) 

(37) 

(38) 

(39) 

(40) 


It is interesting to note the relation between these quantities and the c’s defined by Hartley; if 

c s = 2~~^ s , 7s = fc-V-Vi. (41) 

In the special case when the samples have equal degrees of freedom 

7s = 1 ~ I/*?- 


The values for 8 S for s = 0,1,7, found from equation (39), are given below: 
s S a 

o ■ -ip, 

1 ip(p+i), 

2 -T5M 2 i> a + 3p-l), 

3 tfe?(p-l)(p +1)(^ + 2 ), 

4 -xmJo(6^ 4 +1529 3 -10^ 2 -30^ + 3), 

5 T28T(T-l)(P + l)(p + 2)(2p 2 +2p-7), 

6 ~"7T8P(®P 8 f 21p 5 — 21p 4 -105p 3 + 21;p a +147p-5), 

7 jhPiP- 1) (p +1) (p + 2) (3p 4 +6^ 3 -23p*-26p + 62), J 


(42) 



and the values for the first six a’s from equation (38) are 
2DJ, 

a a = ffi { 3 D 1 / 3 2 + 4-D 2 /? + 2D 3 }, 

« 3 = - + 10-D a /? a + 10Z> 3 /? + 4D 4 }, 

a 4 = ^{15^+40^ + 601)3^ + 48^^+161)3}, 

« 6 = -^/<21i) 1/ fl B + 70D 2 ^+140i)3 / S 3 +1682) 4 /?2 + ii2i) 6 ^+32i) c } ; 
a, = jg/^Z),^ 6 + 28D 2 /? B + 70D 3 /? 4 + 112_D 4 /? 3 + 112D 5 /J 2 + 64Z) 6 /? + 16D 7 }, 












326 


A general distribution theory for a class of likelihood criteria 


2-3. Choice of the value of p 

In order that the series should he of practical utility, it must be possible to represent 
exp {Sa ( ./t _r ( 1 - 2 it)~ r } adequately by a reasonable number of terms of the exponential series; 
this can be done only if the coefficients a are fairly small. In the univariate case, these 
coefficients will be small even if p = 1, and in fact if we put p = 1 and p = 1 in equation (26) 
the series we obtain corresponds exactly with that found by Hartley (1940) using rather a 
different method of approach. The accuracy of Hartley’s series, using only three terms in the 
asymptotic series, was demonstrated (Hartley & Pearson, 1946) by comparison with the 
significance levels obtained from Hair’s exact expansion; the agreement obtained was good 
even when the degrees of freedom were as low as three. In the multivariate case, a much more 
satisfactory series can be obtained if p is less than unity. 

A typical set of curves showing the values of aj/i, a 2 /w 2 , ..., aj/i 6 , and the closeness of 

6 

agreement between Q — g( 0) and — 2 a r jp r for varying values of p, are plotted in the figure 

1 


for the case p = 6, k — 5, v — 9, The curves all have minima or cross the zero line between 
p = 0-7 and p = 0-8. The value of p which makes % zero is p = 0-76296. 

In the calculations carried out here, p was chosen so that a x = 0, since this not only resulted 
in the other coefficients being small, but the absence of a. 1 made the calculation of the a’s 
much easier. Putting ot 1 = Owe obtain 


P = 1- 


(2p 2 + 3p~l) 

6(p + !)(*—1) 


1 

\ N 


(44) 


2-4. Example of a calculation using the series 
To check his two working approximations (a) and (6), Bishop used as a standard of reference 
the values obtained by exact fitting of type I curves to the first two moments of the criterion 
l v In the case p = 4, k = 5, v — 9 Bishop found for the 5 % point a value corresponding to 
M = 70-281. 

To obtain from the series the probability associated with this value, we calculate 


/ = 40, p = 0-808,889, pM = 56-849,5, p = pv = 7-28. 


r (=oi) 

«r lp r 



Pf+ av 

0 

— 


1-000,000 

0-040,742 

1 

0-000,000 


0-000,000 

2 

0-143,702 


0-143,702 

0-092,597 

3 

0-003,073 


0-003,675 

0-131,138 

4. 

0-001,793 


0-012,118 

0-178,763 

5 

0-000,094 


0-000,622 

0-235,161 

6 

0-000,032 


0-000,791 

0-304,909 

7 

— 


0-000,069 

0-369,663 

8 

— 


0-000,044 

0-446,178 


G 

2 drift 0*149,296 

8 

2 ajp v 

1-161,011 



l 

l 




0(0)-<2 0-149,305 

6 

exp {S a T /fi r } 

l 

1-161,016 



Difference 0-000,009 

Difference 

0-000,005 



K = exp {Q-g(0)} 

= 0-861,306, exp- j =0-861,314 



Pr.{iH > 70-281} = KZ{a t 

,//*’} P/+» = 0-0492. 




G-. E: P. Box 


327 


To illustrate the accuracy with which the asymptotic series represents the function, indepen - 
dent calculations of o(0) — Q have been made. As has already been indicated, however, in 
practice this rather laborious calculation would not be necessary, K being taken as 

exp {—Sa r //i r }. 

2-5. Some comparisons between the series and the exact distribution 

Tor the cases p = 1, 1c = 2 and p = 2, 1c = 2, the exact distribution is known for all values 
of r; for p = 1 the criterion will simply be a function of the variance ratio, and when p — 2 
the exact distribution has been found by Pearson & Wilks (1933). Table 1 enables the pro¬ 
babilities obtained from these exact distributions to be compared with those found using the 
series with scale factor p and up to four terms in the asymptotic and exponential series, 
higher terms having negligible effect. The table shows the values of M corresponding to the 
5 % and 1 % points obtained by Bishop by fitting a type I curve to the first two moments of l v 
The exact probabilities corresponding to these points and those obtained using the series 
are shown below the values of M. 


Table 1 . Comparison of the series with the exact distribution 




v = 9 

r = 27 

r = 79 

p= 1 
& = 2 

6 % point (type 1) 
Probability: exact 

Beries . 

4-0499 

0-05005 

0-05005 

3-9042 

0-05009 

0-05009 

3-8794 
' 0-05002 
0-06003 

1 % point (type I) 
Probability: exact 
series 

6-9902 

0-00998 

0-00998 

6-7461 

0-01001 

0-01001 

6-6991 

0-01002 

0-01002 

p = 2 

jfc=2 

5 % point (type I) 
Probability: exact 
series 

8-8801 

0-05005 

0-05005 

8-1191 

0-04997 

0-04997 

8-0018 

0-04979 

0-04979 

1 % point (type I) 
Probability: exact 
series 

12-8969 

0-00999 

0-00999 

11-7844 

0-01000 

0-01000 

11-6074 
0-00997 
, 0-00997 


The agreement between the series and the exact values is remarkably good, the series 
giving five-decimal accuracy in almost every case tested. The more difficult cases, however, 
are those where p and h are larger, especially when v is small. Tor these, the closeness with 
which Sa r //i r approaches g{ 0) - Q and the adequacy, when p is suitably chosen, of the 
exponential series as judged by the comparison of exp (S(a r //i r )} and H{a v j/i v ), support 
belief in the acouracy of this solution. Tor example, the casep = 4, h = 5 which we have used 
to illustrate the calculation of probabilities from the series, is not a particularly favourable 
one. It appears, however, that six terms of the asymptotic series and eight of the exponential 
series will be adequate; in less severe cases of course fewer terms are necessary. Turther 
evidence is supplied later for the accuracy of this type of solution, for in tests of independence 
to be discussed in § 6, exact distributions are available for comparison, in cases where the 
series is not favoured, and excellent agreement is found. 




328 


A general distribution theory for a class of likelihood criteria 

APPROXIMATIONS 

The series we have found is of rather too complicated a character for routine use; as an 
alternative, approximations were sought which were relatively simple. 


3. Approximations using a single y 2 distribution 
We have for the cumulant generating function of M (putting p = 1 in equation (20)) 

nt) = <2-ff(0)-|log(l-2ii)+ £ ^(1 — 2#)-% 

" r=*1 v 


( 45 ) 


where/ = fp(p +1) (k — 1) and a! t is obtained by putting p = 1 (i.e. ft = 0) in equation (43). 
Expanding this expression in powers of t we obtain 


Yw 'lf 2 "‘ w - i)| /{ i hl l f + r I )w} 


l=i j! 

The jth cumulant of M is then given by 


<i = V-Hj -!)!/{ 1 + 3 A i+A 2 + ...}, 


where 


. 2m' 

A = ^f’ . 


(46) 

m 

(48) 


and in particular for the generalized test for homoscedasticity which we are considering, 


j 2p* + 3p-l /»! 1\ 

1 6(/c— 1) (p +1) \i=i v t NJ* 

(p- i)<j>+2) (* i n 

2 Q(lc — 1) N*J- 


(49) 


3-1. The choice of a scale factor in the y 2 approximation 

Now 2 ,_1 (j- 1)!/ is the jth cumulant of y 2 with / degrees of freedom. Thus, to order 
v -1 , (47) is identical with the jth cumulant of Gy 2 , where G is either 1 -\-A x or (1— A^ -1 . If 
A 2 were zero then G = 1 + A 1 would give the first cumulant /q to order and the remaining 
cumulants would clearly be less in error than if G were taken as (1 — Af)- 1 . However, if 
A 2 — A\, it would be preferable to put C = (1 - Af)- 1 , since here this form would give agree¬ 
ment to order v~ 2 . 

Clearly this would also be the better form to use if A 2 were near to or greater than A], 
In the univariate case A 2 = 0 and C should therefore be taken as 


0 = 1+Ai = 1 + - 


3(4-1) 

as has been shown by Bartlett (1937). 

For the generalized test for homoscedasticity we find 


yl_l 

v, N, 


A 2 -A| = 


k 


y\ 


lc-lj 36(p+l) 2 iA 


6(p- l)(p+ l) 2 




+ 3P-1) , (50) 



Gr. E. P. Box 


329 


where y 8 .is defined by equation (40). For p = 1, A 2 = 0, and consequently this quantity is 
negative for all values of h. When yo > 1 it is positive, except in the particular case when 
p = 2 and h = 2 and the v’s are equal, when the quantity in curled brackets is equal to — 1, 
and A{ is almost exactly equal to A s ; if the n’s are not equal, this quantity is greater than 
-1, and it is positive for all larger values of p and /;. 

For the multivariate statistic, p > 1; we therefore take MjG to be approximately 
distributed as y 2 with / = — l)p{p + 1) degrees of freedom and 


0 


= (l-^i) = 


(2p»+3p-l) (l 1 1\ 

6(3» + l)(*-l)V*±iv f N)’ 


if the degrees of freedom are equal this becomes 


1 (2p a + 3p— 1) (k + 1) 

G 6(p + l )kv ‘ ( ' 

We note that I/O is the same as the value p chosen as scale factor (44) in the series solution. 
In the case of samples with equal degrees of freedom, the statistic M is equivalent to Bishop’s 
criterion l lt so that the multivariate scale factor C proposed here is comparable with the scale 
factor G proposed by Bishop and given in equation (3). Table 2 shows a number of comparisons 
for the significance levels, together with the values for the probabilities given by the series. 


Table 2. y 2 approximation; comparisons of scale factors} Significance points 
for M with probability given by series 





P 

= 2 

P 

wmm 

■■ 

P- 

6 

k= 5 
v=9 

5% 

Bishop ( b) 
Box 

23-08 

23-27 

0-0531 

0-0503 

67- 38 

68- 93 

0-0742 

0-0597 

142-19 

148-30 

01633 

0-1041 


1% 

Bishop (6) 
Box 

28-82 

29-01 

0-0107 

0-0101 

77- 01 

78- 74 

0-0173 

0-0135 

166-35 

163-16 

0-0633 

0-0286 

h = 5 
v= 19 

5% 

Bishop (b) 
Box 

22-13 

22-03 

0-0486 

0-0501 

61-36 

61-31 

0-0511 

0-0615 

121- 29 

122- 82 

0-0660 

0-0666 


1% 

Bishop (6) 
Box 

27-34 

27-47 

0-0104 

0-0100 

69- 95 

70- 03 

0-0103 

0-0105 

133-60 

135-13 

0-0144 

0-0116 


It appears that, not only is the factor suggested here very much simpler than Bishop’s, 
but that it also gives a better approximation. However, it appears that even with the scale 
factor C this approximation fails when p is large and v is small. 


4. Approximations using the F distribution 

The y 2 approximation becomes less and less satisfactory as p and k are made larger and v is 
made smaller. We know, however, that for all finite p and k, MJG will tend to a type III 
curve as v becomes large. When v is not large we might expect the point corresponding to the 









330 


A general distribution theory for a class of likelihood criteria 


distribution of M in the ff, /? 2 plane to lie near the type III line, in either the type I or type VI 
regions. We shall see that the use of these curves rather than the type III will, enable us to 
absorb a further terra in the cuinulant series, corresponding to the extra adjustable para¬ 
meter available with type I and type VI curves, and thus ensure agreement in the cumulants 
to order v~ 2 . Although percentage points of the 5-function have been tabled (Thompson, 
1941), tables of the function F are usually more readily available. Tor this reason results 
which occur in the 5-function form will be inverted, so that only tables of the F distribution 
will be required in using these approximations, and they will be referred to as F approxi¬ 
mations. 


4-1. Choice of relevant type of curve 


The ‘start’ of the probability density function for M is at zero. Tor the Pearson system of 
frequency curves in which the restriction is made that the start of the curve is at zero, the 
relation between the cumulants 


K z K i 


= 2t 


(62) 


corresponds with Pearson’s type III curve when r = 1. If t slightly exceeds unity the curye 
falls in the type VI region, if it is slightly less than unity it falls in the type I region. Substi¬ 
tuting the values for the cumulants of the criterion M, using equation (47), we obtain, 
ignoring terms of order v~ s , 

l + 4A 1 + 7A a + _3Aj 

l + 4A 1 + 6A a -(-4A|‘ 1 ' 


Thus for all sufficiently large values of v the region into which the curve will fall is given by 


-d 2 > A. | 
r>l 
Type VI 


A^Al 

T = 1 

Type III 


A% < jdf') 

t < 1 
Type! 


(64) 


Tor example, from equation (50) obtained in the case of the generalized test for homo- 
scedasticity, it is clear that for p = 1 the curve will be in the type I region, and for nearly 
all other cases, when p is greater than 1, it will be in the type VI region. 


4-2. Type VI 

The F distribution with 2P and 2 Q degrees of freedom is defined by 

p(F) = constant F P ~ 1 {PF + 0) -(P +®h (56) 

The rth moment of a quantity bF, where b is a constant, is given by 

( u QYT(P+r)T{Q-r) 

M bF ) ~[ b pj V(P)V(Q) ’ (56 ^ 

from which, after some algebraic reduction, we obtain the first four cumulants of bF as 
/c#T) = P(6/P)(l-l/Q)-i J 

K,(bF) = F(b/iy (i +(p~ i )/q) (i - i/G)- 2 (i - mr\ 

K 3 (bF) = 2P(6/P)3 (1 + (P -1 )IQ) (1 + (2P-1 )IQ) (1 - 1IQ)~ S (1 - 2 IQ)-1 (1 - 3/6)- 1 , 

K A (bF) = 6 P(bjPf{{PlQY (5 IQ - 11 IQ») + (1 - I/O) 2 (1 - 3/0 + 2/0 2 + 6P/0 - 13P/0 2 )} 

x(l-llQ)-*(l~2IQ)-*(l-3IQ)-Al-4lQ)-\ (57) 



G. E. P, Box 


331 


Now we have seen that M is approximately distributed as Gy 2 , so that if r is greater than 
unity we would expect to he able to find values b, P and Q, so that bF would be an even better 
approximation. Since we already know that the distribution is close to type III, we would 
further expect that Q 'will be large compared with P since this will be so for type VI curves 
close to the type III line. 

If then we ignore terms of order (P/0) 2 , we find 

Kl (bF) = P( 6 /P) (1 + 1 /o}/ 

«i{bF) = P(&/P) 2 { 1+ PIQ+ 3/0} J 

x 3 ( 6 P) = 2P(6/P)3{1 + 3 P/0+ 6/0}, (58) 

* 4 (6J , ) = flP(6/P)*{H-6P/0+10/0}.. 

Wa™ T, n + 9 V ~ f — f OD—-P _A~f 2 , . A 


Now put 2P=A=A W = = and 6 = 

then we obtain approximately 

K i(bF) = /{1+ A 2 }, 

K 2 (bF)= 2/{l + 2 J I 1 + 3A 2 }, ( 59 ) 

K a {bF)= 8/{I + 34 1 + 6A 2 }, 

/c 4 (6P) = 48/(1 + 4A 1 +104 2 }, 

which are identical to order v~ 2 with the cumulants of M given by equation (47). Thus M/b 
will be distributed approximately as F with A and / 2 degrees of freedom, where 


A=A A = 


1 — A\~ A/A 


4-3. Type I 

We define a quantity X distributed in a type I form with parameters P and Q, 

p{X) = constant X F ~ 1 {1—X)^~ 1 . 

The rth moment of bX, where b is a constant, is given by 


y' r (bX) = &»■ 


r(P + r) T(P + 0) 
r(P)r(P+o+»-)’ 


from which we find the first four cumulants of bX to be 


(61) 

(62) 


K i(bX) = P(6/0) (H-P/0)~\ 

K t (bX) = 'P(blQ)*(l + PIQ)-*(l + {P+l)IQ)~\ 

K 3 (bX) = 2P(6/0) 3 (1 - p/0) (1 + P/Q)- 3 (1 + (P+1//0)- 1 (1 + (P+ 2)/0)-h • (63) 

k 4 (6Z) = 6P(6/0) 4 {l + l/0-2P/0-4P/0 a - 2 P 2 /G 2 + P 2 / ( 3 3 + P W 

X (1 + p/0)- 1 (l + (P +1)/0)- 2 (1 + (P+2//0)- 1 (1 +(P+ 3//0)- 1 . 

As before if 0 is large compared with P, so that terms of order (P/0) 2 may be ignored, 
we obtain 

Kl (bX)= P(b[Q) {1- P/0}, 

< 2 (bX)= P(6/0) 2 {l-3P/0 + l/0}, 

K 3 (bX) = 2P(6/0) 3 {1 - 6P/0 — 3/0}, 

/c 4 (6Z) = 6P(6/0) 4 {l-10P/0-6/0}, 



332 A general distribution theory for a class of likelihood criteria 
and putting 2P=f 1 =f, 2Q =/ 3 = and b = 


we again obtain approximately the values given in (59) which to the order of approxi¬ 
mation v~ z are the cumulants of M. 

Thus M(b will he distributed as X in expression (61) with 2P = A and 2 Q — / a and 


A~ /> A — 


fi±2 
rif—ri 2 


6 = 


/a 


l-A 1 + 2jf 2 ‘ 


(65) 


Alternatively, • 


AM 


will be distributed as F with / t and/ 2 degrees of freedom. 


’A(b-M) 

We note that although M can vary from 0 to oo, bX can vary only between the limits 
0 and b , so that we are fitting a curve with limited range to one with infinite range. In practice, 
however, this presents no difficulty (see, for example, the comparisons of Tables 3, 4 and 5), 
for since the distribution of M will be near to type III, / 2 will be large compared with f x \ 
consequently b will be large compared with A- The mean for such curves will be approxi¬ 
mately equal to f v so that the range will be large compared with the mean, and the part of 
the curve ignored by the truncation will be negligible. 


4-4, Application of the F approximation in tests of homoscedasticity 

Prom (50) we know that when p = 1, A 2 — A{ is negative, and hence the type I form of the 
approximation is appropriate. Whenp >2 we have seen that, except for the case p = 2,k = 2, 
when to this degree of approximation the curve is almost type III, ri 2 — A\ is positive and 
the type VI form is appropriate. 


4-41. Univariate test (p = 1) 

When p = 1, A 2 is zero, so that to carry out the test we calculate in turn 

a-tp b -i=£m 


( 66 ) 


and refer 


AM 


to tables of the F distribution with A and/ 2 degrees of freedom. 

sedom 
lc+ 1 


f t (b—M) 

In the special case when the degrees of freedom are equal 


3vk ' 


(67) 


To test the accuracy of the approximation we will compare the values it gives for the 5% and 
1 % points of M, with those obtained from (1) Bartlett’s approximation, (2) Bishop & Hair’s 
(1939) values and (3) the x 2 series given by Hartley and corresponding to equation (26) with 
p = 1 and p=l. Tables 3 and 4 are adapted from those given by Pearson & Hartley (1946) 
with the value of Bartlett’s approximation and the present approximation added. In 
Table 3 a number of comparisons are made for the special case where the degrees of freedom 
are equal, and Table 4 shows a few comparisons for the case of five estimates of variance with 
unequal degrees of freedom. 

If the accuracy is j udged by the closeness of agreement with the values obtained by Bishop 
& Nair, it appears that the F approximation is an improvement upon that suggested by 



G. E. P. Box 


Table 3. Comparison of approximations. Significance, points 
for M (equal degrees of freedom, p = 1) 





6% 



1% 


V 

Bartlett 

(X s ) 

Box 

(F) 

Hartley 

(series) 

Bishop 
& Nair 

Bartlett 

(X s ) 

Box 

(F) 

Hartley 

(series) 

Bishop 
& Nair 

3 

2 

7-32 



7-11* 

11-26 


10-57 



3 

0'88 

6-83 

6-79 

6-80f 

10-57 

10-41 

10-32 



4 

6'66 

6-83 

6-61 

6-62* 

10-23 

10-14 

10-10 



9 

6-29 

6-29 

6-28 

6-30f 

9-67 

9-64 

9-64 

9-67f 


2 

11-39 

11-23 

11-01 


15-93 

15-52 

15-16 

1.6-32* 


MM 

10-75 

■ 


10-67+ 


14-88 


14-91t 


M 


■ 




14-42 

14-46 

14-47* 



9-91 

9-89 

mmm 

9-93f 

13-87 

13-86 

13-84 

13-88’i 




19-08 

19-46 


26-68 

25-22 

24-65 

24-90* 


HS 

18-99 

18-91 

18-79 

18-82t 

24-31 

24-12 

23-97 

24-09f 


B9 

18-47 

18-41 

18-38 

18-42* 

23-65 

23-54 

23-49 

23-34* 



17-81 

17-61 


17-84f 

22-69 

22-58 

22-53 

22-48f 


* Calculated from Naif's exact distribution. t Calculated by fitting type I curve to L v 


Table 4. Comparison of approximations. Significance points fen- M 
(-unequal degrees of freedom, p — 1, k = 5) 


N 

h 

Vi 


Vi 

Vs 

5% 

i% 

Bartlett 

(X s ) 

Box 

(F) 

Hartley 

(series) 

Bishop 
& Nair 

Bartlett 

(X 8 ) 

Box 

(F) 

Hartley 

(series) 

Bishop 
& Nair 

20 

6 

6 

4 

2 

2 




El 

14-97 

14-82 

14-62 

14-80 

45 

16 

16 

9 

2 

2 



hqPI) 

mm 

14-62 

14-51 

14-31 

14-46 

20 

5 

5 

4 

3 

3 

■ 


10-41 


14-68 

14-58 

14-51 

14-69 

45 

14 

14 

9 

4 

4 





14-00 

14-05 

14-03 

14-05 


Bartlett and is about as accurate as Hartley’s series, ■whilst it requires no special tables and 
involves only simple calculations. 

Since the approximations proposed by Bartlett, Hartley and the present author are 
essentially asymptotic, it is to be expected that for small values of v, and particularly when 
v = 1, the approximations will break down. This does in fact happen to a oertain extent with 
all of them, but it seems least serious with the present F approximation; for example, when 
k = 4, v = 1, we have 


Approximation 5 % point 1 % point 

Bartlett (x~) 11’1 16-1 

Hartley (series) 9-0 11-8 

Box [F) 10'3 14-6 

Nair’s expansion 10'0 14-1 


















334 


A general distribution, theory for a class of likelihood criteria 

For the case v = 1, Table 5 compares, for a number of values of Jc, the 5% and 1 % levels 
given by Bartlett’s approximation and by the present method with values obtained by 
Bishop & Nair (1939) using Nan’s expansion. 


Tables. Comparison of the approximations when v = 1. Significance points for M 


Value of h 



4 

6 

6 

7 

8 

9 

10 

5 % point 

Bartlett (y a ) 

6*8 

8-7 

11-1 

13-3 

15-4 

17-4 

19-3 

21-3 

23-1 

Box ( F ) 

Nair’s expansion 

5-1 

7-9 


12-6 

14-6 

16-7 

18-6 

20-5 

22-4 


5-1 

7-7 

10-0 

12-0 

14-1 

16-9 

17-9 

19-6 

21-3 

1 % point 

Bartlett (y 2 ) 

10-0 

13-3 

16-1 

18-6 


23*2 

25-4 

27'5 

29-6 ' 

Box (F) 

7-9 

11-3 

14-6 

17-1 

19-2 

21-6 

23-7 

25-8 

27-9 


Hair’s expansion 

8-3 

11-5 

14-0 

16-5 

18-9 


23-1 

26-2 

27-2 


4-42. Multivariate test p ^ 2 
To carry out the test we calculate the quantities 


2p»+3g-i / 1 n , (y-i)(?+2) /yi 

1 6(lc-l)(p+l)\ v t N}’ 2 6(/c—1) [ vf N 2 )’ 


/i==-P-l)p(p + l), / a = 


A+2 


6 - 


A 


i-a-a/a’ 


( 68 ) 


and refer M/b to the tables of the F distribution with/ x and/ 2 degrees of freedom. 
When the .degrees of freedom are equal 




( ff 2 + 3ff'-l)(^ + l) 
6(p+l)1cv 


A= (p-l)(P + 2) 


(k 2 + k + 1 ) 

6Jk 2 v 2 ‘ 


(69) 


George (1946) was able to evaluate the exact distribution of the generalized L t statistic 
in simple cases, although, when the value of p and h are not very small, the method becomes 
unmanageable. She used her exact distribution to check Bishop’s approximations. Table 6 
is taken for George’s Table 1 and shows the equivalent value of M obtained by Bishop’s 
empirical formula, method (a), for the 5 % point, together with the exact value of the 
probability obtained by George by direct integration. The probability corresponding to this 
value of M has also been calculated by the y 2 and F approximations suggested here. Thus the 
closeness with which exact probability approaches 0-0500 indicates the accuracy of Bishop’s 
method, and the closeness with which the probabilities for the y 2 and F approximations 
coincide with the exact probability measures the accuracy of these approximations. 

We see that the values given by the F approximation are in excellent agreement with the 
exact probabilities, and even the yf approximation is considerably better than Bishop’s 
method. Unfortunately, no exact values are available in the cases where p and k are larger, 
when approximation to the curve is more difficult. For these distributions the series given by 
formula (26), using in most cases up to six* terms in the asymptotic series and up to eight* 


* When v = 9, and p = 6 and 6, the coefficients a are rather large, and ten and fourteen terms respec¬ 
tively had to be used in. the exponential series. When p = 6 there is evidence that further terms in the 
asymptotic series would give closefagreement. 












G. E. P. Box 


335 


Table 6. 6 % points for M given by Bishop's empirical approximation with their associated 
probabilities calculated by: ( 1 ) George’s exact method, ( 2 ) the F approximation, ( 3 ) the ^ 
approximation 



V 

M 

Probabilities 


V 

M 

Probabilities 

Exact 

(George) 

F 

(Box) 

«/2 

(Box) 

Exact 

(George) 

F 

(Box) 

v a 

(Box) 


9 

8-924 

0-0492 

0-0492 

0-0492 

P = 3 

9 

15-740 

0-0461 

0-0458 

"0-0446 

fc =2 

14 

7-835 

0-0496* 

0-0494 

0-0496 

k=2 

14 

14-434 

0-0475 

0-0476 

0-0470 


19 

7-831 

0-0496* 

0-0496 

0-0496 


24 

13-598 

0-0485 

0-0486 

0-0485 


24 

7-828 

0-0498* 

0-0497 

0-0497 


29 

13-416 

0-0488 

0-0489 

00487 




— 




39 

13-211 

0-0488 

0-0489 

0-0488 


9 

14-164 

0-0491 

0-0491 

0-0490 







4=3 

19 

13-285 

0-0496 

0-0497 

0-0496 

P = 3 

14 

23-661 

0-0481 

0-0479 

0-0473 


29 

13-031 

0-0497 

0-0490 

0-0499 

k~ 3 

29 

22-288 

0-0484 

0-0480 

0-0478 







p=i 

19 

20-989 

0-0461 

0-0461 

0-0455 







k=2 

20 

19-946 

0-0477 

0-0478 

0-0476 


* These values have been recalculated and do not agree with the values given in George’s table. 


Table 7. Comparisons of approximations. Significance points for M obtained by various 
methods, with probabilities given by series (26) 





P 

= 2 

P 

= 3 

P- 

= 4 

P = 

5 

p = 

6 

k= 5 

5% 

Bishop ( 0 ) 

23-40 

0-0485 



71-07 

0-0434 

_ 


173-17 

0-0105 

r=9 


Bishop (6) 

23-06 

0-0631 

— 

— 

67-38 

0-0742 

— 

- - 

142-19 

0-1633 



Box (; x 2 ) 

23-27 

0-0503 

42-58 

0-0532 

68-93 

0-0597 

103-65 

0-0673 

148-30 

0-1041 



Box (F) 

23-30 

0-0600 

42-83 

0-0606 

69-84 

0-0524 

106-40 

0-0545 

153-36 

0-0692 



Type I 

23-26 

0-0504 

42-88 

0-0502 

70-28 

0-0492 

107-15 

0-0500 

157-38 

0-0488 


1% 

Bishop (a) 

29-19 

0-0096 



81-35 

0-0082 

_ 

_ 

192-34 

0-0010 



Bishop (6) 

28-82 

0-0107 

— 

— 

77-01 

0-0173 

— 

— 

156-35 

0-0633 



Box (x 2 ) 

29-01 

0-0101 

50-24 

0-0111 

78-74 

0-0135 

113-54 

0-0226 

163-16 

0-0286 



Box ( F) 

29-05 

0-0100 

60-58 

0-0102 

79-84 

0-0105 

118-56 

0-0122 

168-94 ’ 

0-0165 



Type I 

29-07 

0-0099 

50-59 

0-0102 

80-45 

0-0097 

120-20 

0-0098 

173-78 

0-0097 

k= 5 

5% 

Bishop (a) 

22-14 

0-0486 



61-63 

0-0489 

_ 

_ 

124-25 

0-0469 

r= 19 


Bishop (b) 

22-13 

0-0486' 

— 

— 

61-36 

0-0511 

— 

— 

121-29 

0-0660 



Box (x 2 ) 

22-03 

0-0501 

39-09 

0-0506 

61-31 

0-0515 

89-08 

0-0532 

122-82 

0-0556 



Box (F) 

22-04 

0-0500 

39-14 

0-0501 

61-47 

0-0502 

89-47 

0-0506 

123-61 

0-0508 



Type I 

21-92 

0-0616 

39-10 

0-0505 

61-36 

0-0511 

89-69 

0-0496 

123-70 

0-0503 


1% 

Bishop (a) 

27-65 

0-0098 



70-50 

0-0096 

_ 


137-09 

0-0088 



Bishop (5) 

27-34 

0-0104 

— 

— 

09-95 

0-0106 

— 

— 

133-60 

0-0144 



Box (x*) 

27-47 

0-0100 

46-14 

0-0102 

70-03 

0-0105 

99-56 

0-0109 

136-13 

0-0116 



Box (F) 

27-48 

0-0100 

46-20 

0-0100 

70-23 

0-0101 

100-01 

0-0101 

136-03 

0-0103 



Type I 

27-55 

0-0098 

46-24 

0-0099 

70-27 

0-0100 

100-24 

0-0098 

136-32 

0-0099 


Biometrika 36 


22 







336 


A general distribution theory for a class of likelihood criteria 

terms in the exponential series, may be used as a standard for comparison. Table 7 shows the 
significance points for M obtained by five different methods together with the probabilities 
calculated from the series. The methods are: Bishop’s empirical approximation (a), Bishop’s 
approximation (b), the y 2 and F approximations suggested in this paper, and the fitting of 
a type I curve by exact calculation of the first two moments. The values for M for Bishop’s 
approximations and the type I approximation have been calculated from Bishop’s significance 
points for l x given in his Tables 9 and 10. 

If we take the series solution as supplying essentially accurate values, we confirm Bishop’s 
suggestion that the type I curve, fitted exactly to the first two moments of Z x , provides an 
exceedingly good approximation. Of the working approximations, the F approximation 
suggested here appears to be the best and the y 2 approximation with the generalized scale 
factor 0 will be fairly satisfactory if p and h are not greater than five and v is not less than, 
say, twenty. 

Table 8 supplies a few comparisons with equal and unequal degrees of freedom. 


Table 8. Significance points for M from y 2 and F approximations for some equal and unequal 
groupings , whenp = 4 and k = 5, with associated probability given by series (26) 


N 

1 

Vi 

v* 

v 3 


»r, 


5% 

i% 

95 

19 

19 



19 

X 2 

61-31 

0-0516 

70-03 

0-0105 







F 

61-47 

0-0502 

70-23 

0-0101 

95 


9 



29 

X s 

63-22 

0-0578 

72-33 

0-0124 







F 

63-99 

0-0521 

73-14 

0-0107 

95 

9 

9 

9 

9 

59 

X 2 

66-32 

0-0627 

75-76 

0-0139 







F 

67-39 

0-0635 

77-07 

0-0110 

45 

9 

9 

9 

n 

9 

X 2 

08-93 

■■ 

78-74 






11 


F 

69-84 

hS 

79-84 

mil 


It appears, at least for unequal samples with none of the degrees of freedom less than 9, 
that the F approximation will be fairly satisfactory. 


5. Generalization oe the procedure 

The method we have developed has so far been illustrated in the case of the univariate and 
multivariate tests of homoscedasticity; its application is, however, more general. In 
fact, the method can he used whenever, by choosing a suitable power of the original criterion, 
we can obtain a statistic W which has its Mh moment of the form 


A iyft) 

m 

n (vf*) 

■i-1 


h m 

npw+*)+&}] 
__ 

n [r \yfi+h)+^}\ 

1=1 


$(W) h — constant x 


(70) 




















Gr. E. P. Box 


337 


m k 

where E aq = 2w 

»=1 3=1 

(The constant will be obtained of course by putting h = 0 in (70) and taking the reciprocal.) 
Many of the tests in Plackett’s review, referred to in the introduction to this paper, fall into 
this category. We have already seen that the generalized L x statistic is of this type; others 
are Wilks’s test for the independence of h groups of variates (which has some important 
special cases; and will be considered in detail in the next section); the generalized test for 
constancy of means, variances and covariances for 1c samples given by Willis (1932); and the 
tests for ‘compound symmetry’ of variance-covariance matrices discussed by Yotaw 
(1948). 

Another group of criteria, which has been studied by Mauohly (1940) and Wilks (1946), 
arises from tests made on a single sample of n yi-variate observations. Mauchly’s criterion 
tests the hypothesis that the variances of the variates are all equal and that the covariances 
between the variates are all zero. Wilks considered criteria for testing three further 
hypotheses: 

(a) That the p means, p variances, and \p{p — 1) covariances for the variates have respec 1 
tively the same unknown values. 

(b) That the variances are the same and the covariances are the same irrespective of what 
values the means have. 

(c) That the means are the same (assuming ( b) true). 

It is hoped to consider some of these tests rather more closely in a later paper. Here we 
shall merely note that, except for Wilks’s third criterion (which is always distributed exactly 
in type I form), the exact distribution of the test function is, in general, not exactly known. 
The expression for the hth moment, however, is in each case of the form of equation (70) and, 
as is shown below, our previous approach will provide approximations in all these cases. 
Tukey & Wilks (1946) have considered this class of statistics and have pointed out that 
they all possess in common the property that, when the null hypothesis is true, they are 
distributed as a product of independent components, each component being distributed in 
type I form. 

Consider the expression (70) for the hth. moment of any statistic W of this type. If we 
take A? =—21ogW as our working statistic, and write (l—p)x i = /3 it (1 — p)y j = where 
p is a constant < 1 at our choice, we find for the cumulant generating function of pM 

x m = 

# 

t in k ~| 

S^log^i- S 2/,-log^ 
i~l j =1 J 

m k 

+ 2 log T{pXi{\ - 2 it) + Pi + Q - 2 log T{p^(l - ‘lit) + Bj + 9/;}, (71) 

1 3=1 

and g(0) is independent off and is obtained by writing t = 0in(71). Expanding the logarithms 
of the T-functions by (18), we obtain the cumulant generating function of pM in the form 

W[t) = Q-ff(0)-|log(l-2it) + E w r (l-2if)-', (72) 


22-2 



338 


A general distribution theory for a class of likelihood criteria 


where 


( m k ) 

E &-2 *-*(»-*) > 

i=l j=l I 

-J Bmtft + U) y Br+fa + l ,) | 

r r(r + l)U£i (^r A W j’ 


Q - £(m-fc)log27r-£logp + £(** + £*-£) log aq- 
^ i-=i 


• S (%+%-i)log^. 

j=i 


(73) 

(74) 

(75) 


From the cumulant generating function (72), the asymptotic y 2 series corresponding to 
(26) and (30) are immediately obtainable. Alternatively, we may obtain approximations in the 
manner given in § 3; the method outlined there is clearly perfectly general for this whole class 
of statistics. We need the quantities A t = 2 (o'ff and A„ = 4m 2 //, where w' is the value taken 
on by (o r when p = 1. The scale factor G for the y 2 approximation will be 1 + A x or (1 - A x )—!; 
a decision between the two alternative forms can be reached by the considerations set out 
in §3-1. Then, to this order of approximation, MjC will be distributed as y 2 . If greater 
accuracy is required we may use the F type approximations described in § 4. The particular 
form is decided by the sign of the quantity A z —A\. If this quantity is positive, the curve of 
best fit will be type VI. Putting 


A=A A = 


A +2 . _ A 

A z -Af i-A-A/A’ 


(76) 


Mjb is distributed approximately as the variance ratio F with f x and / 2 degrees of freedom. 
Alternatively, if A z — A{ is negative the best fitting curve will be type I, and if we put 


then approximately ; 


A = 
A M 


A A = 


A +2 


b = 


Al-Af : l-A 1+ 2// a l 

will be distributed as F with f x and/ a degrees of freedom. 


(77) 


a [b-m 

There are thus a number of possible levels of approximation as measured by the order of 
agreement between the cumulants of the statistic and those of the fitted curve. 


(1) Ignoring terms of order aq \ VJ 1 , M is distributed as y 2 . 

(2) Ignoring terms of order x x 2 , yj i \ by a technique originally used by Bartlett andhere 
generalized, a quantity G can be found such that M is distributed as C r y a . 

(3) Ignoi'ing terms of order n\ 3 4 , yj z , a function of M can be obtained which is distributed 
as the variance ratio F. 

(4) Finally, for very precise work and for checking other approximations, a y 2 series 
solution may be used and here agreement with the cumulants (as represented by their 
asymptoticexpansions) of the statistic can be obtained to as great an order as seems profitable. 

In practice method (4) is sometimes rather long, although it has beenfound very accurate, 
but 13) involves very little labour and will often be sufficiently precise. 

As a second example of the application of this technique we consider Wilks’s generalized 
test of independence. 


6. The generalized test lor independence 

Wilks (1935) considered the following problem: suppose we hare a sample of v + u obser¬ 
vations for a kp variate normal population and we have some a priori reason for dividing 
the variates into k groups containing p u ...,p n , ...,p u ■ ■■,p k variates (where an( ^ 

i 

p is thus the average size of the groups and is not necessarily integer). It is required to test 



G. E. P. Box 


339 


the hypothesis that the h groups of residuals, obtained after fitting u independent constants 
to each of the variates, are mutually independent. 

If | c i} | is the kp x hp determinant of sums of squares and products of residuals for the hp 
variates and | Cy | ; is the x p t determinant of sums of squares and products of residuals of 
the Zth group, then the likelihood ratio criterion obtained by Wilks is 


A- 1%1 . M ’ 

nM nM, 


(78) 


where | r ti | and \r {j \i are the corresponding determinants of sample correlation coefficients 
having v degrees of freedom. Wilis obtained the moments, and also, for special sets of values 
of Jc and p t , the exact distribution of his criterion which generalizes a very large class of 
statistical tests. Problems in which there are more than two groups of variates, i.e. where 
h > 2 , occur for example in educational research', we may have some prior reason fox believing 
that a battery of, say, ten different tests applied to pupils may he divided up into a number 
of groups, each group concerned with some distinct ability, and may wish therefore to test 
the hypothesis that, when the means are eliminated, the selected groups are independent 
of each other. 

When k = 2, we consider only two groups of variates containing in the first and p 2 in 
the second. Since the criterion and its distribution will he unaffected if the set of p 2 variates 
are fixed independent variables and the set of p 1 variates ‘dependent’ variables distributed 
in a variate normal distribution, the function is then appropriate for testing the general 
multivariate linear hypothesis (see, for example, Bartlett, 1934,1938,1947). If, in addition, 
p 1 = 1, then the likelihood criterion is A = 1 — E®, where R is the coefficient of multiple 
correlation between the single dependent variate and the p 2 independent variates. A second 
special case of Wilks’s statistic which is of some interest, and is considered more fully later 
in this section, occurs when there is only one variate in each of the k groups. The statistic 
then supplies an overall test for independence between k variates. For the general statistic 
Wald & Brookner (1941), using a rather different technique from that of Wilks, were able to 
extend the catalogue of values of k and p i for which the distribution of A is exactly known in 
terms of elementary funetions, to include all eases where at most one group contains an odd 
number of variates. These distributions, although exact, are rather complicated in character. 
As an alternative and to cover the remaining eases, these authors obtained a series solution 
andRao (1948) modified this series in the important case where k = 2 to provide an improved 
test in problems of multivariate analysis. These series will later appear as special cases of 
that which we are now investigating. 


6 T. Derivation of the series 
The Ath moment of A is given by Wilks as 


k jjj-l 

n n 


i=l 1—0 




(79) 


So that if we write W = A 4 ”, the Ath moment of IT will be in the form given in equation (70); 
tailing as our logarithmic statistic M = — 2 log W we obtain 


M = — vlog A. 


( 80 ) 



340 A general distribution theory for a class of likelihood criteria 

To obtain the series, we begin as before by defining the relationship between a quantity p 
(less than unity) and quantities /i and jS by the equations p = pjv, v = /i-f/?. It is also con¬ 
venient to define a set of quantities 


: = -Sff, 


which appear in the solution in much the same way as the quantities £ ——appeal in the 

tests for homoscedasticity. Then, as before, we obtain equation (72) for the oumulant gener¬ 
ating function of the pM, and the constants are available by direct substitution in (73), 
(74) and (76), f _ lv . 


/-ft 


,, (~ 2 ) r yVU lt±\ 

T if r(r+l)/i'hEfir ,+1 l 2 / 

«- 


z- i 

fi-'ZPn-j 

n= 1 _ 

2 


The calculation of a r from formula ( 88 ) would clearly be extremely laborious for all but 
small numbers of variates; we therefore seek an alternative simpler form. Using relations 
(32) and (36), we find 

(_iy+i r+l 
ar ~ r(r+ 1 ) (r + 2 ) s 5o 

where tj&ft * B s+1 [ - —~j - 5 S+1 ( - , ( 86 ) 

and the values taken by ( 86 ) when p = 1,2,..., 7, are given by putting p = p l in equation 
(42). Writing <4 for the value which a, f has when p = 1 , i.e. when /? = 0 , then substituting for 
dfpi) in (85) and summing, we obtain for the first six values of a ! r : 

a i ~ A{22 s +3E 2 }, 

a a = 4?{?4 + 2 Ej - Sj}, 

a 3 = d — lb £ 3 — 30£ 2 }, 

= + 6^5 ~ 5 £ 4 — 20 £ 3 + 3£ 2 }, 

a 5 = + ’iSg — 7 £ 5 — 35 £ 4 + 72 8 + 49£ a ), 

a 6 = 2^re{3£ 8 +12 £ 7 —142 6 —84£ 6 -f-2l£ 4 +1962 3 
where E s is defined by equation (81), whence w r e have for the a’s 

«a = a a-ai A+(//4)/? 2 , 

«a = « a- A 2 - (// 6 ) A 3 > 

«4 = «4 - 3«a/? + 3 aJ^ - ai/? 3 + (// 8) /? 4 , 
a 5 = <4>- 4a£/?d 6a 3 /? 2 - 4ai/? s + ai/? 4 - (//10) /? 5 , 
a B = a e ~ 5a 5/? + 10ai/? 2 - 10ai/J 3 + 6a^ 4 - a^/? 6 + (//12) /?“. 






(88) 



G. E. P. Box 341 

As before, from the cumulant generating function we obtain the series corresponding to 
(26) and (30), and if p is chosen so that a 1 == 0, we have 

fi — ZciJf, p = 1 -^j^,(2 S 3 + 3S 2 ). (89) 

Wald & Brookner (1941) derived a y 2 series for this statistic by a different method from 
that used here; it is not difficult to show, however, that the series they obtained is equivalent 
to our series, but with p = 1. In this form the series is of little practical use for small, or even 
moderate values of v because of the difficulty we have noted before of adequately approxi¬ 
mating to exp Sa^r*(l - 2 it)~ r by means of a series, unless a, is small or p is large. By intro¬ 
ducing the factor p, the size of the coefficients a can be greatly reduced and the series be used 
even for fairly small values of v. As an example, consider the case of three groups of variates 
with two variates in each grouping, k = 3, p x = 2, p a = 2, p 3 = 2, and suppose v = 10. The 
values of the coefficients a r /p r are shown below when p = 1 and also when p takes on a value 
■making a x zero. When p = 1, p is of course equal to v. 



Values of a r jp r 


r 

P~ 1 

p~ 0-683 

1 

1-900,00 

0 -000,00 

2 

0-336,00 

0-073,17 

3 

0-086,33 

0-003,71 

4 

0-026,88 

0-001,78 

5 

0-009,40 

0-000,23 

0 

0-003,56 

0-000,07 

Total 

2-361,16 

0-078,96 

9(0)-Q 

2-363,61 

0-078,98 


Bor the Wald & Brookner series, if v is small, tlie coefficients are so large that in practice 
it would be impossible to represent the exponent adequately by a reasonably small number of 
terms of the exponential series; by suitably choosing p , however, the size of the coefficients 
are greatly reduced while the agreement between the sum of the terms and y(0) - Q is im¬ 
proved. In the particular example quoted, the exact distribution is known (Wilks, 1936). It 
appears in rather a complicated form, but has been used here to check the series and the 
approximations. Table 9 shows the 5% and 1 % significance points for the criterion M 


Table 9. Some, comparisons for Wilks’s statistic 





X 2 approximation 

F approximation 

M 

Probability 

■ 

M 

Probability 


Exact 

Series 


Exact 

Series 

k — Z 
fh=2 

p 3 -2 


5% 

1 % 

30-770 

38-366 

H| 

0-0612 

0-0139 

31-357 

39-180 

1 

SIMM 


5% 

1 % 

24-982 

31-149 


0-0516 

0-0105 

25-083 

31-292 

1 

■ 

^1 ^ HRH 







342 


A general distribution theory for a class of likelihood criteria 

obtained by using the y 3 and F approximations which are derived in the next section, together 
with the exact probabilities and the probabilities calculated from the series, using p = 0-683, 
and six terms in the asymptotic and eight in the exponential series. Agreement to four places 
of decimals is usually obtained between the series and the exact value for the probability, 

X 2 approximation. Following the previous procedure, we find that M/O is distributed 
approximately as y 2 with/ degrees of freedom, where 

( 7 = an< ^ f ~ 

F approximation. We have 

/=-p 2> -^x = Y2^( 2 ^a + 3 S 2 ), A 2 = y^^(£ 4 +2S 3 -S 2 ), 

from which, using equations (76) and (77), the F type approximation can be easily computed, 
The quantities S 2 , S 3 and S 4 required in these approximations are given by (81), the 
calculations of Table 9 give some indication of the accuracy to be expected. 


6-2. Special cases 

We consider two important special eases of the statistic, that in which there are only two 
groups of variates and that in which there is only one variate in each of the k groups. 

6-21. Case k = 2 

In this case the expressions for the coefficients in the series simplify considerably. Writing 
Pi —P>Pz~ Si ' we obtain 


f = pq, p= /] = %[p + q+1 ), 

a 1 = 0, a 2 =ff(p 2 + g a -5), 


a s = 0, a 4 = J|^{3p 4 +3g 4 +10pV-50(p 2 + g 2 ) + 159}, 

a e = 0, a g = ~~{3(p 6 + q B )-105(p i + q i ) + l,ll3(p 2 + q 2 ) 

+ (21p 2 - 350 + 21q z )pY - 2,995}. 


Putting these values in (26) and (30) we confirm* the series given to terms in /i~ 4 by Rao 
(1948) for this case, ic = 2. Bartlett had already (1938) obtained the y 3 approximation using 


P + q+l 
2v 


(which is of course the factor given by the present pro- 


the scale factor ^ = 1 

cedure). Rao introduced this scale factor into the Wald & Brookner series, so as to obtain a y 2 
series with Bartlett’s y 2 approximation as the leading term, equivalent to (30). As we have 
seen, this choice of factor results in this particular case in oq and the a’s of odd order being 
zero, so that the calculation of the series is correspondingly simpler. 


* There appears to be a misprint in Rao’s paper in the expression which corresponds with a 4 , where 
the constant 159 is wrongly given as 150. 



G. E. P. Box 


343 


• • M / 'p - 4 - (7 + 1\ 

y 2 ( Bartlett) approximation. — = 11 -—— I M is approximately distributed as y 2 

with f = M degrees of freedom. 

F approximation. Wefind A 2 — A{ = (p 2 +q 2 — 5)/12v a ; thus for yj and 2, ri a -.ri;(>0, 
and the type VI form will be appropriate. Mjb will be approximately distributed as F with 
l and/ 2 degrees of freedom, where 

t = w f _ jMgg+j) /, _ n 

A A p 2 +g 2_ 5 - 6 - ,£±1+1 _A’ 

2^ A 

For p or q equal to 1 and 2, the exact distributions are known and provide simple tests 
(Wilks, 1932, 1935). For these cases A and y'A respectively are distributed in a type I dis¬ 
tribution, and the significance test can be made, either by directly entering Thompson’s 
tables of percentage points of the incomplete B-function, or by inversion of the statistic to 
its equivalent ‘variance ratio’ form and using tables of F or of Fisher’s z (Bartlett, 1934; 
Rao, 1948). As has been pointed out by Barslett (1938) ifp = 1 and q = 2 (orp = 2 and q — 1) 

^ = -—- M is distributed exactly as y 2 , and substituting these values for p and q in the expres- 
G v 

sions for <x r we find that in this case all these coefficients are zero, so providing a useful check. 
If p and q were both unity, A 2 -Af would be negative and the type I form be appropriate 
for the F approximation. Of course we shall not need to use the method here because the 
criterion a/( 1 - A) is the sample correlation coefficient r and the exact distribution is known. 
The exact distributions are also known in certain other cases (Wilks, 1935; Wald & Brookner, 
1941); the form which these take, however, is rather complicated, but they are useful to 
check approximations. In Table 10 are shown the 5 % significance points of M for a number 
of combinations of p and q as given by the y 2 and F methods of approximation. In the cases 
chosen, the exact distribution is known, and this has been used to calculate the exact pro¬ 
bability associated with each of these points. For comparison, the probability given by the 
series, using terms up to a 6 in the asymptotic series and, for most values, up to a 8 in the 
exponential series, is also shown. 

We see that, providing v is sufficiently large, Bartlett’s approximation is in good agreement 
with the exact values, and the F approximation, since it involves very little more labour, 
provides a worth-while improvement. If v is not large and one is doubtful whether these 
approximations will be sufficient, a rough but useful indication is provided by comparing 
the values obtained by the y 2 and F approximations (in calculating the F approximation 
one will have already calculated the quantities needed for the y 2 approximation). If these 
two approximations give substantially the same value, it may generally be taken as an 
indication that the approximation is adequate. If they differ markedly, a more accurate 
value should be calculated from the series. 

6-22. Case pi = 1, l - 1,2, ...,k 

If the h groups each contain only one variate, the hypothesis tested is that each of the 
variates is independent of all the others. The A criterion then becomes the determinant of 
the sample correlation matrix, e.g., if Jc = 3, 


? ’23 



344 


A general distribution theory for a class of likelihood criteria 

Table 10. 5 % significance points for M 


p 

2 

V 

X s (Bartlett) 


F (Box) 

H 

M 

Probability 

M 

Probability 

Exact 

Series 

Exact 

Series 

1 

1 

9 

4-010 

0-0494 

0-0494 

4-592 

0-0499 

0-0499 

1 

6 

10 

17-032 

0-0024 


17-542 

0-0555 

0-0555 



20 

13-419 

0-0518 


13-504 


0-0604 

1 

10 

20 

20-153 

0-0000 


27-022 

0-0562 

illlillllM 



40 

21-538 

0-0525 


21-690 



2 

2 

9 

13-137 

0-0515 

0-0615 

13-200 

0-0506 

1 

2 

5 


30-512 


0-0737 

31-664 


0-0614 




22-884 


0-0529 

23-053 



2 

10 

20 

40-534 

0-0753 

0-0753 

48-104 

0-0595 

0-0595 



40 

37-505 > 

0-0535 

0-0536 

37-775 

0-0507 

0-0607 

4 

4 

10 

47-811 

0-0945 

0-0940 

49-996 

0-0735 

0-0731* 



20 

33-931 

0-0542 

0-0542 

34-210 

0-0512 

0-0512 


* With v = 10 andp = q = 4, six terms were taken in the asymptotic series and twelve in the exponential 
series; greater accuracy can toe obtained toy taking more terms in the asymptotic series. 


where r tj is the usual sample product moment correlation coefficient between the i'th and jth 
variates. When 7c = 2 the criterion is simply 1 — ■ 

The statistic is useful in supplying an overall test of independence between the k variates. 
For example, when k = 5 there will be ten individual correlation coefficients. Even when the 
null hypothesis, that all the variates are uncorrelated, is true, we shall expect often to come 
across individual coefficients which are ‘significant’. For such a case it will be appropriate 
to apply the overall test before testing individual correlations. Again, the expressions for the 
coefficients simplify and we find, choosing p so that a x = 0. 





2k+ 5 
6v ’ 




2 k + 5 


a i — 0 , = 

a s Jf^(k-2)(2k-l)(k+l), 











G. E. P. Box 


345 


a 4 = (1 W - 32fc 8 - 252 B + 26 87c +1147), 

a ^lS^ (fc “ 2)(/C+1)(2Z; " 1)(8fc2 - 87c ~ 97 )> 

Jc(1c— 1 ) 

= 78M^08^ 967!8 “ 1,488fc5_ 12l676&4 + 27 > 6327 ‘ :3+137 > 490F_ 161 > 654fc_562 ’ 103 )' 


For the yf approximation we find, from the argument of § 3 - 1 , that we should take 
1 2ifc + 5 ( 2& + 5'l 

— 1- 0 ^ 7 “ • Thus |l-— M will he approximately distributed as y 2 with ^k(k— 1 ) 

degrees of freedom. 

For the F approximation we have 


/= 



k 2 +3k-+2 
2 ~ 6].’ 2 


For Jc = 2 and 3 we use the type I form and for 1c ^ 4 the type VI, since A 2 —Af = 2 ^ 2fe 2 13 

OUy 

is negative when 1c = 2 or 3 and positive for larger values of 7c. We then calculate f v / 2 and b, 
required in this approximation, by formulae (77) and (76) respectively. 


7. Summary and conclusions 

For a particular olass of likelihood criteria, whose moments appear as the product of F 
functions, a general method is described for obtaining probability levels when the null hypo¬ 
thesis is true. A number of statistics whose moments appear in this form are referred to, and 
a general method developed to obtain: 

(a) A series which is in close agreement with the exact distribution. 

(b) An approximate solution, using a single x 2 distribution, which is sufficiently accurate 
for moderate or large samples. 

(c) A rather better approximation, using a single F distribution, giving close agreement 
even when the samples are rather small, 

The method is illustrated for the following two general statistics: 

( 1 ) Tests for constancy of variance and covariance 

(a) Univariate case. The F approximation is of the same .order of accuracy as Hartley’s 
(1940) series solution although it requires very much less calculation, and significance may 
be judged by consulting tables of the significance points of the variance ratio F alone. 

(b) Multivariate case. The series solution shows remarkably close agreement with the 
exact distribution when this is known, and is used in other cases to compare approximations. 
The x 2 approximation does not correspond with that found by Bishop (1939), but is, in 
fact, simpler and more accurate. 

The series confirms the accuracy of significance points found by fitting a type I curve to 
the first two moments of l x . The calculation of the moments involved in this method renders 
it too laborious for routine use, and Bishop suggested two working approximations; the F 
approximation developed here is more accurate than these approximations, whilst it involves 
no more labour and can be used when the sample sizes are unequal. 



346 


A general distribution theory for a class of likelihood criteria 

(2) Wilks’s test for independence of Jc groups of variates 

The asymptotic series, and % 2 and F approximations are derived for this case, and the 
relation of the results with those of Wald & Brookner, Bartlett, and Rao is discussed. The 
exact distribution is used to assess the accuracy of the proposed methods in a number of oases. 
The probabilities given by the series are found to be in excellent agreement with the true 
values, even for fairly small samples. Providing the sample sizes are not too small, the y 2 and 
F approximations will be sufficiently accurate, the latter providing the better approximation, 
and allowing the sample size to be rather smaller than is possible with the % 2 approximation. 
When the number of variates in each group is one, we have a test criterion for the hypothesis 
that k variates are mutually independent, and the same procedure provides the series solution 
and simple approximations for tests of significance. 

In conclusion, I wish gratefully to acknowledge the help and guidance I have received 
from Dr H. 0. Hartley throughout this investigation. 


REFERENCES 

Baums, E. W. (1899). Mess. Math. p. 64. 

Bartlett, M. S. (1934). Proc, Camb. Phil. Soc. 30, 327. 

Bartlett, M. S. (1937). Proc. Boy. Soc. A, 160, 268. 

Bautlett, M, S. (1938). Proo. Camb. Phil. Soc. 34, 33. 

Bautlett, M. S. (1947). J.B. Statist. Soc. Suppl. 9, 176. 

Bishop, D. J. (1939). Biometrika, 31, 31. 

Bishop, D. J. & Naie, U. S. (1939). J.B. Statist. So a. Suppl. 6, 89. 

Brown, G. W. (1939). Ann. Math. Statist. 10, 119. 

George, A. (1946), Sankhyd, 1, 20. 

Hartley, H. O. (1940). Biometrika , 31, 249. 

Hartley, H. 0. & Pearson, E. S. (prefatory note) (1946). Biomctrilca , 33, 296. 
Mauohly, J. W. (1940). Ann. Math. Statist. 11, 204. 

Miune-Thomson, L. M. (1933). The calculus of finite differences. Macmillan. 
Nair, TJ. S. (1939). Biometrika, 30, 274. 

Nayer, P. P. N. (1936). Statist. Res. Mem. 1, 38. 

Neyman, J. & Pearson, E. S. (1928). Biometrika, 20 A, 175 and 263. 
Neyman, J. & Pearson, E. S. (1931). Bull. int. Acad. Qracovie, A, p. 460. 
Neyman, J. & Pearson, E. S. (1936). Statist. Res. Mem. 1, 1. 

Neyman, J. & Pearson, E. S. (1938). Statist. Res. Mem. 2, 25. 

Pearson, E. S. & Wilks, S- S. (1933). Biometrika, 25, 353. 

Pitman, E. J. G. (1939), Biometrika, 31, 200. 

Plackett, R. L. (1948). J.R. Statist. So c. 109, 457. 

Plaokbtt, R. L. (1947). Biometrika, 34,311. 

Rao, C. R. (1948). Biometrika, 35, 71. 

Thompson, C. M. (1941). Biometrika, 32, 151. 

Thompson, C. M. & Merrington, M. (1946). Biometrika, 33, 296. 

Tukey, J. W. & Wilks, S. S. (1946). Ann. Math. Statist. 17, 318. 

Votaw, D. F. (1948). Ann. Math. Statist. 19, 447. 

Wald, A. & Brookner, R. J. (1941). Ann. Math. Statist. 12, 137. 

Welch, B. L. (1935). Biometrika, 27, 145. 

Welch, B. L. (1936). Statist. Res. Mem. 1, 52. 

Wilks, S. S. (1932). Biometrika, 24, 471. 

Wilks, S. S. (1935). Bconometrica, 3, 309. 

Wilks, S. S. (1946). Ann. Math. Statist, 17, 257. 

Wishart, J. (1928). Biometrika, 20 A, 32. 



NOTE ON APPROXIMATIONS TO THE POWER FUNCTION OF THE 

‘2x2 COMPARATIVE TRIAL’ 


By G. P. SILLITTO 

Research Department, Imperial Chemical Industries, Ltd., Nobel Division 

The power function, of the 2x2 table arising in wliat Barnard (1947) has called the ‘ 2 x 2 
comparative trial’ and Pearson (1947) has designated ‘Problem II’ has been discussed by 
P. B. Patnaik (1948) in this journal. In his investigation he made approximations of the type 
which involve representing binomial or hypergeometric distributions by normal distribu¬ 
tions, and in certain cases he examined the adequacy of the approximations numerically, 
by comparing values of the approximate power function he derived, with values calculated 
exactly. 

There is some interest in comparing Patnaik’s approximation with that obtained by using 
the angular transformation for a binomial variate. If P is the probability that an individual 
will possess a given character and/ — r/n is the proportion or relative frequency of individuals 
with this character observed in a random sample of n, then it is known that 

x = arc sin-v//, (1) 

where the angle is measured in radians, is distributed approximately normally about a mean 
of arc sin jP with a standard deviation of \The problem of comparing two observed 
values of/, thus becomes equivalent to that of comparing variables from normal populations 
with known standard deviations. 

The transformation (1) has of course been known for a long time. It has recently been 
discussed by Eisenhart (1947), who gives a bibliography, and Paulson & Wallis (1947) have 
indicated its application in the planning and analysing of experiments for comparing two 
percentages. Bartlett (1937) has given a table of the transformation, the angle being in 
radians, while Fisher & Yates (1938) have tabulated it, showing the angle in degrees. Since 
the reason for using a transformation is essentially one of convenience and since the use of 
the radian as the unit leads to a slightly simpler expression for the variance, there are 
advantages in employing the latter unit. For the case of two independent samples, if the 
observed relative frequencies/! and / 2 are based respectively on m and n observations, then 
the difference between their respective transformations x 1 and x t wall be approximately 
normally distributed with known standard error, 

Thus (x 2 - aqj/tr will vary about a mean 

fi ; = (arc sin - arc sin /R/Ar, 

with unit standard deviation. If the null hypothesis is true, I\ = P 2 and /<■ = 0 , Defining 
u a as follows, 


( 2 ) 

(3) 


a - 


co 


l 


e-V*dt, 


(4) 



348 Approximations to the power function of the ‘2x2 comparative trial' 


the chance of establishing significance at the 100a % level, when P 1 AP^, i.e. the power of 
the test, will be approximated as follows: 


(а) For one-sided test 

(б) For two-sided test 

Power 


Power 


_ f" 1 




/: 




V( 27r ) 


e-Wdt+ 


U^—fl 


V( 2 ") 




(5) 

( 6 ) 


The position is illustrated in Fig. 1. For the one-sided test the power is the area under the 
curve centred at //, lying to the right of the ordinate at u a . For the two-sided test it is the sum 
of the areas under this curve lying to the left of the ordinate at - u ia and to the right of the 
ordinate at u ia . 


Distribution on the 
null hypothesis. Pi=Pj 



In order to obtain an indication of the usefulness of the approximation to the power of 
the 2x2 comparative trial which is afforded by using the transformation (1) and the normal 
distribution theory just outlined, values for the power of the test have been calculated for 
the cases for which Patnaik evaluated the true power of the 2x2 table. The results are given 
in Tables 1 (a.), ( b), 2 (a) and ( b ) below. The first two tables correspond to Patnaik’s Tables 
2 (a) and (&), and the bracketed figures are the values for the power as calculated from the 
approximate theory given above, while the other figures are the values for the true power, 
given by Patnaik. Similarly, in Tables 2 (a) and (6), the exact values are quoted from 
Tables 3 (a) and ( b ) in Patnaik’s paper, and the corresponding approximate values from the 
present method are shown for comparison. 

It is evident that the approximation is quite good, over the range covered by the present 
tables. It appears, in fact, to be slightly better on the whole than Patnaik’s, a detailed com¬ 
parison of the figures with those in his paper giving the results shown in Table 3. Omitting 
the P 1 = P 2 diagonals of Tables 1 (a) and (6), there are 172 cases in which numerical com¬ 
parison is possible. 

It may be mentioned also that in the region where the true power is 0- 9 or greater, which is 
probably the most important in practice, Patnaik’s approximations generally tend to over¬ 
estimate the power, whilst the present method tends to under-estimate it. Some indication 
of the practical importance of the difference between the approximations to the power 
function can be obtained in cases where the true power is known, by using the approximations 
to provide estimates of n, the sample size required in order to afford a test of assigned power, 



G. P. SlLLITTO 


349 


Table 1 (a). Power function for m = 18, n — 12. 

Significance level a = 0T0 in the two-sided test 

Approximate values derived through the angular transformation are shown in parentheses. 


“ / ha 

H* 

0-1 

0-2 

0-3 

0-4 

0-5 

0-6 

0-7 

0-8 

0-9 

0-9 

1-000 

0-997 

0-976 

0-907 

0-773 

0-674 

0-356 

0-170 

0-088 


(l'OOO) 

(0-996) 

(0-974) 

(0-917) 

(0-800) 

(0-619) 

(0-398) 

(0-197) 

(0-100) 

0-8 

0-997 

0-972 

0-882 

0-732 

0-494 

0-289 

0-144 

0-094 

0-202 


(0-996) 

(0-966) 

(0-882) 

(0-733) 

(0-533) 

(0-326) 

(0-166) 

(0-100) 


0-7 

0-977 

0-880 

0-722 

0-487 

0-272 

0-136 

0-092 

0-159 

0-393 


(0-974) 

(0-882) 

(0-714) 

(0-501) 

(0-298) 

(0-163) 

(0-100) 



O'6 

0-918 

0-742 

0-615 

0-287 

0-142 

0-099 

0-156 

0-320 

0-610 


(0-917) 

(0-733) 

(0-601) 

(0-290) 

(0-169) 

(0-100) 




0-6 

0-796 

0-634 

0-309 

0-166 

0-100 

0-156 

0-309 

0-634 

0-796 


(0-800) 

(0-633) 

(0-298) 

(0-169) 

(0-100) 





04 

0-610 

0-320 

0-165 

0-099 

0-142 

0-287 

0-516 

0-742 

0-918 


(0-619) 

(0-326) 

(0-163) 

(0-100) 






0'3 

0-393 

0-159 

0-092 

0-138 

0-272 

0-487 

0-722 

0-880 ' 

0-977 


(0-398) 

(0-166) 

(0-100) 







0-2 

0-202 

0-094 

0-144 

0-289 

0-494 

0-732 

0-882 

0-972 

0-997 


(0-197) 

(0-100) 








04 

0-088 

0-170 

0-366 

0-574 

0-773 

0-907 

0-975 

0-997 

1-000 


(0-100) 










Table 1 (6). Power function for m = 18, n- 12. 

Significance level a = 002 in the two-sided test 

Approximate values derived through the angular transformation are shown in parentheses. 


V: 

0-1 

0-2 

0-3 

0-4 

0-5 

0-6 

0-7 

0-8 

0-9 

0-9 

0-998 

0-976 

0-902 

0-752 

0-532 

0-290 

0-103 

0-017 

0-006 


(0-996) 

(0-971) 

(0-897) 

(0-759) 

(0-564) 

(0-353) 

(0-173) 

(0-060) 

(0-020) 

0-8 

0-961 

0-882 

0-694 

0-490 

0-265 

0-116 

0-035 

0-016 

0-055 


(0-971) 

(0-870) 

(0-693) 

(0-476) 

(0-275) 

(0-127) 

(0-046) 

(0-020) 


0-7 

0-909 

0-714 

0-460 

0-248 

0-111 

0-041 

0-021 

0-050 

0-177 


(0-897) 

(0-693) 

(0-453) 

(0-248) 

(0-111) 

(0-041) 

(0-020) 



0-6 

0-767 

0-500 

0-262 

0-111 

0-041 

0-023 

0-048 

0-138 

0-361 


(0-769) 

(0-476) 

(0-248) 

(0-107) 

(0-039) 

(0-020) 




0-5 

0-625 

0-293 

0-124 

0-045 

0-022 

0-045 

0-124 

0-293 

0-625 


(0-564) 

(0-275) 

(0-111) 

(0-039) 

(0-020) 





0-4 

0-361 

0-138 

0-048 

0-023 

0-041 

0-111 

0-262 

0-500 

0-767 


(0-353) 

(0-127) 

(0-041) 

(0-020) 






0-3 

0-177 

0-050 

0-021 

0-041 

0-111 

0-248 

0-460 

0-714 

0-909 


(0-173) 

(0-046) 

(0-020) 







0-2 

0-055 

0-016 

0-035 

0-115 

0-265 

0-490 

0-694 

0-882 

0-961 


(0-060) 

(0-020) 








0-1 

0-006 

0-017 

0-103 

0-290 

0-532 

0-752 

0-902 

0-976 

0-998 


(0-020) 











350 Approximations to the power function of the ‘2x2 comparative trial 5 


Table 2 (a). Some values of the power function for the two-sided test for m — n— 15 


Px 

P ■ 

a = 0-10 

a=0-02 

Exact 

value 

Approximate 
value from 
angular 
transformation 

Exact 

value 

Approximate 
value from 
angular 
transformation 

0-3 

0-4 

0141 

0-156 

0-034 

0-042 

0-6 

0-8 

0-306 

0-334 

0-112 

0-133 

0-1 

0-3 

0-389 

0-409 

0-149 

0-180 

0*2 

0-7 

0-806 

0-893 

0-680 

0-713 

0-05 

0-6 

0-910 

0-922 

0-736 

0-770 

0-1 

0-6 

0-919 

0-926 

0-739 

0-778 

0-2 

0-8 

0-974 

0-970 

0-872 

0-885 

0-1 

0-7 

0-980 

0-978 

0-894 

0-910 


Table 2 (6). Some values of the power function for the two-sided test for m = n = 30 


Px 

P, 

« = 0-10 

a = 0-02 

Exact 

value 

Approximate 
value from 
angular 
transformation 

Exact 

value 

Approximate 
value from 
angular 
transformation 

0-06 

0-3 

0-884 

0-864 

1 1 

0-662 

0-1 

0-4 

0-886 

0-878 

Vr&’.. j \ mini 


0-3 

0-7 




0-805 

0-2 

0-6 

0-945 

0-948 


■ 0-828 

0-1 

0-5 

0-977 

0-974 



0-2 

0-7 

0-993 

0-993 

0-965 

0-961 


Table 3. Comparison of approximations to the power function of the 2x2 comparative trial 


Table no. 

No. of cases in 
■which Patnaik’s 
first approxima¬ 
tion is nearer to 
the exact value 

No. of indecisive 
cases (equally 
discrepant, or 
indeterminate 
owing to 
rounding-up) 

No. of cases in 
which the angular 
transformation is 
nearer to the 
exact value 

1» 

6 

28 

38 

1(6) 

6 

36 

30 

2(a) 

0 (1) 

3 (3) 

13 (12) 

2(6) 

0 (0) 

0 (1) 

12 (11) 

Totals 

12 


93 


Bracketed figures in the body of this table refer to Patnailc’s second approximation. 
























351 


G. P. SlLLITTO 

/?, when a particular significance level, a, has heen chosen. In his Table 7 Patnaik has esti¬ 
mated n for selected oases of this kind using the two-sided test and a = O' 10 and 0-02. Table 4 
contains his results together with those obtained by using the angular transformation 
approximation. In order to obtain a simple expression for n, using this approximation, the 
left-hand term in expression (6) may be neglected, since it makes an appreciable contribution 
to the power only when the power is rather low, unless a is larger than is usual in significance 
testing. Making the relation between /? and u^, as for a and u a defined in equation (4), it 
follows that for the two-sided test 

= (7) 

Hence, using equations (2) and (3), it is seen that 

- + ~ = 4 (arc sin yP 2 - arc sin jPffj (u iec - uf ) 2 . (8) 

In the particular case when m — n, therefore, n is given by 

n = l( u ha — u fi) 2 K &vc sin *JP 2 - arc sin fP-yf. (9) 

This formula has been given by Paulson & Wallis (1947) with reference to the single-sided 
test, and they have provided a nomogram for determining sample sizes in the one-sided case, 
which can, of course, be used for a two-sided experiment if their a is taken as one-half the 
risk of the error of the first kind which can be tolerated. Using (9) on the cases of Patnaik’s 
Table 7, the values in Table 4 are obtained. 


Table 4. Estimation of the sample size from Patnaik’s first approximation 
and from the angular transformation approximation 


True 
sample 
size = n 

-Pi 

-P* 

True power* 

Patnaik’s 
estimate of n 

Estimate of n 
from equation (9) 

For 

a = 0-10 

For 

a = 0-02 

For 

a = 0-10 

For 

a = 0-02 

For 

a = 0-10 

For 

a=0-02 

15 

0-05 

0-5 

0-910 

0-736 

11 

11 

15 

14 

15 

0-1 

0-6 

0-919 

0-739 

13 

12 

15 

14 

15 

0-1 

0-7 

0-980 

0-894 

13 

12 

16 

16 

30 

0-05 

0-3 

0-884 

0-631 

26 

23 

33 

29 

30 

0-1 

0-5 

0-977 

0-902 

27 

26 

31 

31 

30 

0-2 

0-7 

0-993 

0-965 

28 

28 

31 

31 


* The power of the two-sided test. 


The angular transformation (1) is, of course, a particular case of a type of transformation 
recently considered by Anscombe (1948), and it is not difficult to show, using his methods, 
that for finite numbers of observations and P 4=0-5, the expectation of x is not exactly 
arc sin fP, but differs from it by a function of n and P, which increases as n decreases and 
as | P — 0-5 j increases. The higher moments of x exhibit rather similar departures from those 
of a normal variate. It is to be expected, therefore, that the angular transformation approxi- 

Biometrika 36 





352 Approximations to the power function of the ‘2x2 comparative trial 5 

mation will be unlikely to remain satisfactory for smaller n and extreme frequencies, but on 
the present evidence it appears to be satisfactory for values of n and P which are of interest 
in many practical problems. 

Grateful acknowledgement is made to Prof. E. S. Pearson for his interest and for helpful 
suggestions as to the framing of this note. 


REFERENCES 

Anscombe, E. J. (1948). The transformation of Poisson, binomial, and negative-binomial data, 
Biometrika, 35, 240. 

Babnakd, G. A. (1947). Significance tests for 2 x 2 tables. Biometrika, 34, 123. 

Bxrtlett, M. S, (1937). Sub-sampling for attributes. J.B. Statist. Soc. Suppl. 4, 131. 

Eisenhart, 0. (1947). Inverse Sine Transformation of Proportions. Chapter 16 of Selected Techniques 
of Statistical Analysis by the Statistical Research Group, Columbia University; edited by Eisenhart, 
Hastay and Wallis. New York: McGraw-Hill Book Co. Inc. 

Fisher, R. A. & Yates, P. (1938). Statistical Tables for Biological, Agricultural and Medical Research 
3rd ed. 1948, p. 56, Edinburgh: Oliver and Boyd. 

Pa.tnaik, P. B. (1948). The power function of the test for the difference between two proportions in a 
2x2 table, Biometrika, 35, 157. 

Paulson, E. & Wallis, W. A. (1947). Planning and Analyzing Experiments for comparing Two Per¬ 
centages. Chapter 7 of Selected Techniques of Statistical Analysis by the Statistical Research Group, 
Columbia University; edited by Eisenhart, Hastay and Wallis. New York: McGraw-Hill Book 
Co, Inc. 

Pearson, E. S. (1947). The choice of statistical tests illustrated on the interpretation of data classed 
in a 2 x 2 table. Biometrika, 34, 139. 



[ 353 ] 


THE DISTRIBUTION OE ‘STUDENT’S’ t IN RANDOM SAMPLES OE 
ANY SIZE DRAWN EROM NON-NORMAL UNIVERSES 

By A. K. GAYEN, St Catharine's College,, University of Cambridge 

1. Introduction 

Tlie effect of universal non-normality on ‘ Student’s ’ t has been studied so far either by way 
of particular numerical examples or by approximations to its sampling distribution based 
on the large sample assumption, E. S. Pearson (1928, 1929) has shown in his experimental 
investigation that the effect of universal ‘excess’ and of ‘skewness’ on ‘Student’s’ ratio 2 
(which is related to t by t — zf(n — l)) may be considerable. He has also furnished a small 
table, based on experimental results, showing some actual values of the probability integral 
for the ratio in samples of 2, 5,10 and 20 from a few universes with specified values of ‘skew¬ 
ness’ and ‘excess’. M. S. Bartlett (1935) obtained some theoretical results for approximately 
representing the distribution of t, taking into account the universal ‘ skewness ’ and ‘ excess ’. 
The approximation is based on the assumption of large samples and is not very satisfactory, 
for in that approach, as he himself has also pointed out, the term ‘approximation’ is perhaps 
misleading. Still from the form of the expression obtained, it has been observed that the 
effect of the ‘skewness’ A 3 in the original distribution is the more serious and that of the 
‘ excess ’ A 4 is small. Tins, of course, confirms Pearson’s experimental results, but the resulting 
expression cannot clearly furnish a quantitative measure of the corrections to be applied 
to normal theory probabilities of t, when the sample size is small. R. C. Geary (1936) obtained 
an expression for the distribution of t in samples of any size drawn from a slightly asymmetrical 
universe. It consists of two components, one being the ‘normal theory’ t and the other a 
term in A 3 , which has been called the ‘ corrective function ’ due to it. He has also given a table 
of corrective tail-area probabilities for some representative values of n. In a recent com¬ 
munication (1947) he has given from a different approach an approximate formula for the 
frequency of t, correct to n~ 2 , n being the size of sample. Erom this and analogous results on 
non-normal Fisher’s z, he has shown, by a few illustrative examples, that the inferences 
drawn from the standard tables may be seriously in error even in some cases where the 
parent is not considerably non-normal. He has accordingly pointed out the primary import¬ 
ance of testing for normality from the available samples and has suggested that when 
universal normality cannot be assumed, the standard tables should be corrected by using 
the sample estimates of A 3 and A 4 , etc., in conjunction with the theoretical results deduced 
therein. But so far as the frequency of t is concerned, satisfactory measures of probabilities 
for small samples cannot be obtained from his results, since they are mainly based on large 
sample assumptions. Also the first few terms of his result, as far as n~ l , agree with those of 
Bartlett, which, as has been pointed out, are not very satisfactory. 

The purpose of the present investigation is to obtain the form of the corrective terms due 
to population cumulants A 3 , A 4 , etc., in the frequency function of t for any size of sample. 
For this the parent population will be specified by the Edgeworth series. It has been proved 
rigorously by Cramer (1928) that this series gives a real asymptotic expansion of any universe 
/(>*), in powers of ryU where v 0 is the number of sources of ‘elementary errors’. Inclusion of 


2 V* 



354 


Distribution of ‘ Student’s ’ t from non-normal universes 

terms of order vfr 1 , .. v^ ir gives rise to the first, second, third,..., (r+ l)th approxima¬ 

tions to the law of error. We shall consider mainly the third approximation, including terms 
in A 3 , A 4 and A§, but shall also deal with the fourth approximation which takes in A 6 , A 3 A 4 
and A 3 . 

Probabilities of values of t, obtained from the derived formula, for the parent specified by 
the third approximation, are in satisfactory agreement with the experimental determina¬ 
tions made so far, even hi cases where the sampled populations are represented by the 
exponential curve or a very skew Type III curve of Pearson, Accordingly, it appears' that 
the derived expressions for ‘Student’s’ i may perhaps provide quite satisfactory estimates 
of the probabilities, for a fairly wide class of non-normal universes, especially when the 
sample size is not too small. 

The values of the cumulants of the population have been assumed to he known, and the 
question of their estimation has not been considered. 


2. Joint distribution or the sum and the sum oe squares 

OI' DEVIATIONS OE % SAMPLE OBSERVATIONS 


We shall first of all obtain the distribution of ‘ Student’s ’ t, with particular reference to the 
parent population being specified by the third approximation to the law of error. This 
will give us the corrective terms due to A 3 , A 4 and A|. It will be shown afterwards that 
similar corrective terms due to A B , A 3 A 4 and A| can be obtained by the same methods when 
the parent population is represented by the Edgeworth series up to the corresponding 
terms. 

Let us then consider the parent population to be specified by the third approximation, 
correct to 

f(x) = + + ( 2 - 1 ) 


where A 3 and A 4 are the measures of universal skewness and excess, 




1 

V ( 2 w) 


e-**\ 


(• 2 ’ 2 ) 


the normal function in standardized form, and 

#°(z) = <i>i x ) = (-!)” #„(*)0(*). ( 2-3 ) 

where H„(x) is the well-known Hermitian polynomial of degree v. 

Let x X) x 2> ..., x n be n independent sample observations drawn at random from the universe 
( 2 - 1 ), and let us adopt the notations 

= (24) 

1 1 

n .02 

and . 8 t ~'Z(z,-z)* = S' t - l = {n-l)s\ (2-5) 


where s 2 is the estimated variance. 



A. K. GrAYEN 


355 


By the use of Bartlett’s method (1935) the joint frequency density of [S^S^) (which is 
also the joint frequency density of (8 ± , *S'j), since the Jacobian of transformation is unity) 
may be written in the form 

M 

fi (S v St) = W(n- 1) —^ {D»W (n + 6) - QD t D 2 W(n+ 3)} 

+ — {D\W{n + 7) -12 D|Z> 2 lf(n + 5) + 12Z>ifP(» + 3)} 


+ [nD\ W(n+ 11) - 6(2 n + 3) DjD a TF (n 4- 9) + 36(m + 4) D\D\W(n +7) - 120£ a W{n + 5)}, 


when terms in A i other than in A 3 , A 4 and Af are neglected. In (2-6) 


( 2 - 6 ) 



^ (o) W(*»»)lla«lXW)'l 



(j8f;— 

J[2nn) x 2**“ r(^) 

(2-7) 

and 

L= ±_ D =± 

1_ av 2_ a^‘ 

(2-8) 


We now proceed to evaluate the derivatives involved in the expression (2-6). It is possible 
to transform the product operators of the form DfDj 2 (where v v = 0,1,2,...) into functions 
of the single operator D v 

For any typical function W(n (,), defined in (2-7), where n' 0 stands for (w-1), («+1), 
(n +3),.,., it can be shown that 


D?DfW(n' 0 ) = ^F(K)-(^j2) + ^ D?W(n ' 0 - 4)- ... 

+ ( — l) r (^, 2 ) Dl 1 W(n' 0 — 2r) + ... + ( —l^D^W^o — 2r a )] • ( 2 '9) 

By differentiating W(n’ 0 ) successively with respect to S 1 we get 
DiWK) = (-l)WK-2)(|), 

ww)=(-D‘ 

and so on. 

It will be observed that the numerical coefficients are those of the Hermitian polynomials 


( 5 )). 

m 3 tSA 2 ) 

\ n) + «-6) (n' n -8)\ %) r 


H^x) = x 


-- 1 ) t „-2 , v(v-l)(^-2 ) (v — 3) 4 __ 

""xir + 2^.2! -■ 



356 Distribution of ‘ Student's' t from non-normal universes 


So we can write, in general, 


w«) = (-ir^K~2v) 


i/sa-__ v(v-i) /^y- 2 m 

{\«J 2.V.(n' 0 ~2v)\n) \nj 

, v(v-l)(v-2)(v“3) (VT* (^Y 

+ 2 i .2\(n' 0 -’'2v + 2)(n' 0 -2v)\n) \n) 


( 2 - 10 ) 


For the present case as we have specified the population by the third approximation to the 
law of error we require the values of D\, for v = 0,1,2,... up to 6. 

Now using results (2-9) and (2-10) in (2-6), we obtain, after some simplification, the 
following expression for the joint frequency function of and S 2 : 


WM) - W{n- 1) [l+^{^(|) + 3(|)^(|) 


*b 




(n-l)/^yi tO| 


^ +ef- 2 ) hJ +3^——; I 

4! \ \n) \n) \nj (n + l)\w 


[*(!)•-«*+.)(»■ 


(»+i) 


(f 


( 2 - 11 ) 


where H^SJn) is the Hermitian polynomial of degree v in {Sfn), and W{n- 1) is given by 


(2-7). As a check it was found that, to the required approximation, 


gq($i, S 2 ) dS t gave the 


standard formula for the distribution of the sum of n sample observations. 


3. The frequency function of estimated variance 

Integrating (2-11) for S x between the limits — oo and oo, we find, for the frequency of S lt 
after some simplification, 

where F (/Sf 2 , (n— I)) is the frequency density of normal S 2 . For the distribution of s 2 = S 2 j (n -1), 
this becomes 

9i(', (n-l)s 2 ) (n-l)d(s 2 ). (3-2) 


Note that if we introduce the operator D 
be put in the form 


D 

™-d[S,l(n- 1)] 


the frequency function (3-1) can 


F( W — 1 ) + ^Z)§ 2 F (W + 3 )—|-Al^-^Z)3 aF ( tl+6 ) i (3-3) 


where S 2 has been omitted from the F’s. 



A. K. Gayen 


357 


4. The distribution of‘Student’s’ f 

Put t in (2-11) and integrate for <S' 2 from 0 to co, when we find that a typical term 

W{n- 1)/Si*#£» dS x dS % becomes 

n jri . 2 toi+w r[|(ra + r t + 2 r a )] _ F>dt 

<jn(n- l)Wi+«r[i(n--l)] [l + f*/(m-l)]««+ri+a^- 

It will be noticed that (4-1) gives the normal theory t for r 1 =’r a = 0. We shall denote the 
distribution of t by p(t). Thus we get by using (4-1) and after simplification 

..._W_ 1 {3(»-l)f-(2»-l)f«} 

f[ ) ^ln[(n -1)] r[i(» -1)] ‘ [1 + t z l(n -1)]»* 3 6(n -1) V(2wr) [1 + f 2 /(w -1)]«"+« 

i r[i(» + 2)] {3(w-l)-6(ft+l)« 2 -|-(w+l)< 4 } 

4 2inJ\n(n~l)] rft(n + 3)] [1 + f 2 /(w- 


rr if n + 2M 3(»-l)*(2iH-ll)-9(»-l)(* + 3)(2»-l)f« 1 

Ltl f — 3(?H-1) (n + 3) (2m + 13){ 4 + (w+ 1) (?H-3) (2n+5)i 6 | 
+ 3 144u,(u,-l)^wr[i(w + 5)][l + f 2 /(u,-l)]^+ li ) 

= p a (t) + \Px,{t) ~ KPxSt) + A8i>Ai(t). ( 4 ' 2 ) 

Obviously p 0 (t) is the normal theory t for a sample of size n, and p Aa (i), p Al (f) and p A §(f) are 
the corrective terms due to universal A a , A 4 and A|. 

As a check on the above expression for t, moments may be calculated directly, about 
t = 0 and to n~ 2 ; they are found to be 


/tiW = _ ^( 1+ i + -)’ 

/4(f) = 1+-(1 + A|)+-j(3-A 4 ) + ..., 

• V fb 

^ W== “S( 1+ S + -)’ 

/4(f) = 3 + - (9 — A 4 + 14A|) + ~ (102 — 30A 4 + 120A|) +.... 

71 71 


(4-3) 


These are in agreement with the asymptotic formulae for the moments of t for any universe 
as obtained by Geary (1936), up to first power of A 4 and second power of A 3 . All these cor¬ 
rective terms when integrated between the limits — co and co contribute nothing towards 

the integral, as may be easily deduced from the fact that p(t) dt = 1. 


5. The distribution of ‘Student’s ’t bob. the parent specified 

BY THE FOURTH APPROXIMATION TO THE LAW OF ERROR 

Let us specify the parent universe by the fourth approximation to the law of error, correct 
up to 1 ^“ , given by 

( 5-0 



358 Distribution of ‘ Student’s ’ £ from non-normal universes 

It is of interest to know the distribution of s 2 and 1 Student’s ’ t in random samples from tMs 
universe. We can arrive at these results by following a similar procedure, and the expressions 
corresponding to (2-6), (2*11) and (4*2) will have additional terms involving A 5 , A 3 A 4 and A|. 
We shall omit these intermediate steps and give below only the extra terms in the joint 
distribution of S t and S 2 and in the distribution of t. Thus corresponding to (2-11), the joint 
distribution j 2 {S lt fi 2 ), of S x and /S 2 will be given by 


ffz($u $a) = !7i($i> + W{n— 1) 


L 5! ) \n) \n) \nj (n-\-l)\nJ \nj 


+■ 


U'Ao A 


3 'M 

144 


■ IS, 
n\~ 
L \» 


+ 3(3» + 4) + 2l(n+ 4) + 3(3» + 32) ^ 


(n- 




+ 6(7» 2 + 43n+20)^ 1 j S 


+ 9(3n 2 + 39n+ 28) 


(I)] 


+ 


1 


(* + 0(»-i)(»)T 3 ’‘ <, ’‘ ,+68 ” +,5> (»)’ +9(3 " ,+ “”* +,1 ’‘ +20) (»)' 

3 » ($*\ 3 r,„ .__ . 


& \ 3 r 

■*■] (3m. 3 + 53n 2 + 129ti + 95) j 


(w + 3)(m + 1)(w-1) \ n 

+ sl([“'(«) ,+9 ” < “ +3 )(^) ,+2,( " +1 ><* +8 >(^) 


+ 1 


+ 


- 9 (n + 3) (3n + 32) + 135(» + 6) 

• (S=Ij (I) [»'<» + ’) (5)’ + 3»(2«< + + 33) (&)‘ 

+ 3(3«, 3 + 51u. 2 + 166W +60) | 

27 

(w + 1) (n 


,(&)* 

\n 


+ 3(23» 2 + 161k. + 96) 


(SA¬ 

UL 




>(f 


ft 2 (ft 2 + 14fl,+41) I —j -fw(3ft 3 -l-61w 2 + 27lft-}-225) 


(f 


+ 3(13ra 3 + 106?i 2 + 131 w+30) 


S- 


'] 


9 n 


(n+3)(n+I)(*-l)Vif) [^ 3 + 71 ^ + «^ + 635) 


(&V 

\n 


+ 


54m 2 


(m+5)(m+3) (n+l)(n- 


+ 3(29m 3 + 273w z + 533 m + 285) 

T) (f) 4 [ (4 ®’ + 4 3« a + 1189H-115) (!)]}] . (5*2) 


Since the expressions appearing with A 6 , A 3 A 4 and A| in (5-2) are odd functions in S lt the 
integral of g^S^S^) with respect to <S , 1 will be independent of these eumulants, so that the 
distribution of s 2 will be the same as that obtained in (3*2) and (3*3). 



A. K. Gaybn 


359 


For the distribution of t we proceed as before using (4-1), and obtain after,simplification 
(p(t) being given by (4-2)) 

. {[n a + (M-l) 2 ]i5+10(w-l)t 3 -16(w-l)2<} 

4Q^(2nn)]n{n-l) i [l+^1(71-1)]^+^ 


(40m 3 + 268m 2 + 182m + 51) t 1 - 3(150m 3 + 896m 2 + 869m + 297) i 5 1 
+ 16(m — 1) (68m 2 + 311m + 237) £ 3 — 15(m — l) 3 (25 m + 59)<) 
144[ A /(27 m)] m(m— l) 3 [1 +1 2 /(m —l)]« ,l + 7 ) 


1(64m 4 + 952m 3 + 3578w 2 + 301 1 m + 996)«»- 9(144m 4 +191 8m 3 + 701 lw 2 '! 
I + 7550 m+ 2913) t 7 + 27 (m- 1) (252m 3 + 2821m 2 + 8554m+• 7077) i 6 1 

l — 45(m— 1 ) 2 (541m 2 + 2086m+ 3049) I s +945(m — l) 3 (3M+13)t J 

1296[^/(277m)]m(m- l) 4 [l+i 2 /(M-l)] i(,l + 9 ) 


= Pott) + KP\,tt) ~ ^4 Phtt) + AaP A ij(S) + ^sP*.,tt) + A 3 A 4 p AsAl (J) + A 3 p A g(i). (5-3) 


Polynomials in t appearing in the numerator of the corrective functions are analogous in 
form to the Hermitian polynomials of the same degree. The form of the corrective term due 
to A 5 is simple but those due to A 3 A 4 and A 3 are rather complicated. 


6. Tail-area probabilities 


We are interested only in the tail area of these corrective terms and so consider the integrals 




For the first of the four right-hand members of (4-2) the integral has been tabulated. For the 
next member the two tail areas will be equal in magnitude but opposite in sign. In each of 
the other two members we shall have equal areas at both tails. These integrals may be 
tabulated for given values of n and i 0 , either directly from their algebraic expressions or more 
conveniently by using the Incomplete Beta function table of Pearson. For the sake of 
completeness we shall give here the expressions for tail areas of all the components, intro¬ 
ducing P 0 (i 0 ), P\ a tto)> P^tto) and P rttto) t° represent them. Thus, for one tail, the corrected 
probability integral P(i 0 ), will be given by 

P(to) = Potto)+W 3 tto)~\P>, l tto) + ^tto), (6*1)' 


(*-<« /m—1 1\ 

in which P 0 (i„) = J _ p 0 (t) dt = I * p 0 (t) dt = j-, ^J, 

ft 1 

r-u l [ + (»-i) °j 

[1+< 2 /(m-1 )]«»*» 


( 6 - 2 ) 

(63) 


which is evidently of order n~*. For calculation, however, it is more convenient to write 


We also have 


"\6V(2M7r)jM 2 ’ / l3V(2M7r)j 4 «l 2 1 T 


t, f _i " f 00 ( 1 T(^m) 

P h tto) = J_ Jx+tt)dt = J p Xl {t)dt = 


(m—1) 7m—1 1\ (m-1)(m + 2) 

24 2 ’2/ I2m "° 


/ »+1 A 

l 2 >2) 


+ 


(6-3 bis) 
3(m-1) 

1 0 (n + 1) h 

j[l + <g/(M-l)p l+2 > 

(6-4) 

(m + 4) (m — 1) /m+3 1\ 

24m m °\ 2 ’ 2 /’ 

(6-4 bis) 



360 


Distribution of ‘ Student's 5 t from non-normal universes 


and lastly, 


f-h T /A T< | (2« + 5) 1 (£») 

^s(W = J_ dt = J u dt ' \36(n-lj® ~ 1 )] r[i(»-l)] 

2(2n- 7) (n- 1) 3(2n+ H) (ft- l) a 

0+ (n+l){2n+5). 0 (*+l) (»+3) (2to + 5) 0 

X [l+ig/(w-l)]i^+ 4 > 

(w— 1) (271+5)1 ( n- 1 1\ ( («-— 1) (2m 3 + 5n + 8) ) / ra+1 1 

^ 2 ’ 2/ \ 24n / 2 ’2 


72 


? 


+ 


(m,-1)(2«, !! + 5m,+ 12)) T /ft+3 1\ 
7 24w j ,,0 \ 2 ’2/ 


( 6 - 6 ) 


(n — 1) (2n? + 5n+ 12)1 

•*-U 


72 n 


(n + 5 

iyi 

l 2 

2 ). 


where t a is any typical value of t, and the transformed variate u 0 is given by 


(6-5 bis) 


u n = 


1 +$/(»-!) 


and /„ (|v x , ^v 2 ), is the Incomplete B-funotion as defined by Pearson. Obviously P Aj (i 0 ) and 
P A ,(i 0 ) are of order w _1 . For < 0 = 0(0-5) 4 and n - 2,3,4,5,6, 7,9,13, 25 and co, values of 
P 0 (f 0 ), P x {t 0 ), P Ai (l 0 ) and P A g(< 0 ) have been tabulated (Table 1) with the help of Pearson’s 
Incomplete B-function table. The sample sizes are chosen to correspond to the degrees of 
freedom 1, 2, 3, 4, 5, 6, 8, 12, 24 and oo, as in Fisher’s table of z. 

It will be noticed that the effect of A 4 is very small, as has been inferred by various authors. 
The corrective tail area beyond | « 0 ( = 3 (which is nearly the 4 and 3 % points in normal 
samples of 5 and 6 respectively) is at most 1 % when the size of sample does not exceed 5, 
about 0-3 % for samples of 10 and about 0-1 % for samples of 20. The actual probability is 
obtained by multiplying these tail areas by the given value of A 4 , and they are additive to 
that of the normal theory t. 

As Geary points out, the effect of A 3 is rather serious, but so is that of A|. For a two-sided 
test the positive and negative tail areas will, of course, balance each other, but those due to 
Afj will not. 

For a comparative study of the form of the frequency curves of the normal theory t and 
those of the corrective functions for A 3 , A 4 and A§, diagrams have been constructed for 
n = 3, 6,13 and 25. They are shown in Figs. 1-4. As is to be expected, the curves of the cor¬ 
rective functions tend to those of the third, fourth and sixth derivatives of the normal 
function as n tends to infinity. 

An example to illustrate the use of the tables has been taken from Neyman & Pearson’s 
(1928) paper ‘On the use and interpretation of certain test criteria’. The given values of A 3 
and A 4 are not large and more or less satisfy A. L, Bowley’s (1928) criterion for moderately 
abnormal curves. 

Example (Neyman & Pearson, 1928, p. 203). Records of weight in a large population of 
mice for males between 120 and 140 days of age show; a mean value of 23-823 g., and for the 
frequency distribution (l Y = 0-086, = 2-687. Can the group of six mice with the weights 

22-5, 26-0, 20-5, 24-0, 18-0 and 24-5 be considered a random sample from this population? 

Here A 3 = + 0-293 (inferred as positive), A| - 0-086, A 4 = - 0-313, and we find t = - 1-038 
( = -<„, say) for n' = 5 degrees of freedom. Considering both tails, the normal theory 
probability is p„(| = 0-3468, , 



Table 1. Comp arative values of P 0 (t 0 ), P^(t 0 ) : P Al (t 0 ) and P A j(i 0 ) for different degrees of freedom 
n'(=n-l,n being the size of sample). P(i 0 ) = P 0 (g + A 3 P Aa (if 0 ) - A 4 P Aj (i 0 ) + A|P A ,(g 


h 

PM 

P\ S M 

Pa 4 W 

Pa;( 0 - 

■P o(^o) 

PwW 

Pu( f o) 

P„M 



n' = 

1 



n' 5 = 

2 


0-0 

0-5000 

0-0470 

0-0000 

0-0000 

0-6000 

0-0384 

0-0000 

0-0000 

0-5 

0-3524 

0-0689 

-0-0064 

-0-0066 

0-3333 

0-0495 

- 0-0069 

-0-0066 

1-0 

0-2500 

0-0666 

0-0000 

0-0044 

0-2113 

0-0697 

-0-0027 

0-0009 

1-6 

0-1872 

0-0622 

0-0047 

0-0147 

0-1362 

0-0563 

0-0025 

0-0118 

2'0 

0-1476 

0-0547 

0-0064 

0-0188 

0-0918 

0-0469 

0-0047 

0-0172 

2'6 

0-1211 

0-0476 

0-0066 

0-0195 

0-0648 

0-0375 

0-0061 

0-0179 

3-0 

0-1024 

0-0416 

0-0064 

0-0188 

0-0477 

0-0298 

0-0047 

0-0165 

3-6 

0-0886 

0-0368 

0-0059 

0-0176 

0-0364 

0-0239 

0-0041 

0-0145 

4-0 

0-0780 

0-0329 

0-0056 

0-0163 

0-0286 

0-0194 

0-0036 

0-0125 



n' - 

3 



n' = 

4 


0-0 

0-5000 

0-0332 

0-0000 

0-0000 

0-5000 

0-0297 

0-0000 

0-0000 

0-6 

0-3257 

0-0431 

-0-0062 

-0-0056 

0-3217 

0-0387 

-0-0056 

-0-0047 

1-0 

0-1955 

0-0540 

-0-0034 

- 0-0002 

0-1870 

0-0495 

-0-0036 

-0-0005 

1'5 

0-1153 

0-0513 

0-0013 

0-0098 

0-1040 

0-0473 

0-0006 

0-0084 

2-0 

0-0697 

0-0413 

0-0036 

0-0162 

0-0581 

0-0372 

0-0028 

0-0135 

2-5 

0-0439 

0-0310 

0-0039 

0-0157 

0-0334 

0-0266 

0-0031 

0-0139 

3-0 

0-0288 

0-0229 

0-0034 

0-0139 

0-0200 

0-0184 

0-0027 

0-0119 

3'6 

0-0197 

0-0169 

0-0028 

0-0114 

0-0124 

0-0127 

0-0021 

0-0095 

4-0 

0-0137 

0-0126 

0-0023 

0-0093 

0-0081 

0-0088 

0-0016 

0-0072 



n' = 

= 5 



n‘ = 

= 6 


mm 



mm 



1 

0-0000 

0-0000 

iBH 

0-3192 




0-3174 


-0-0044 

-0-0036 

1-0 

0-1816 

0-0397 

-0-0035 

-0-0005 

0-1780 

munlHfli 

-0-0033 

-0-0006 

l'B 

0-0970 

0-0440 

0-0002 

0-0074 

0-0921 

0-0413 

0-0000 

0-0066 

2-0 

0-0610 

0-0340 

0-0022 

0-0122 

0-0462 


0-0019 

0-0111 

2-6 

0-0272 

0-0234 

0-0025 

0-0126 



0-0021 

0-0113 

3-0 

0-0160 

0-0154 

0-0021 

0-0104 


0-0132 

0-0017 

0-0092 

3-6 

0-0086 

0-0099 

0-0016 

0-0079 



0-0012 

0-0067 

4-0 

0-0062 

0-0066 

0 0011 

0-0057 


1 

0-0008 

0-0047 



n’ 

= 8 




= 12 


0-0 

0-5000 

0-0222 

0-0000 

0-0000 

0-5000 


0-0000 

0-0000 

0-5 

0-3153 

0-0291 

-0-0037 

-0-0028 

0-3131 


-0-0027 

-0-0019 

1-0 

0-1733 

0-0384 

-0-0030 

-0-0005 

0-1685 


-0-0023 

- 0-0002 

1-5 

0-0860 

0-0371 

- 0-0002 

-0-0055 

0-0797 


-0-0004 

0-0042 

2-0 

0-0403 

0-0277 

0-0014 

0-0094 

0-0343 


0-0009 

0-0073 

2'5 

0-0185 

0-0177 

0-0016 

0-0095 

0-0140 


0-0011 

0-0072 

3-0 

0-0085 

0-0103 

0-0013 

0-0074 

0-0055 

SU® * y 

0-0008 

0-0053 

3-5 

0-0040 

0-0068 

0-0008 

0-0051 

0-0022 

0-0036 

0-0005 

0-0033 

4-0 

0-0020 

0-0032 

0-0005 

0-0033 

0-0009 

mmm 

0-0003 

0-0019 



n' 

= 24 


n' = oo 

0-0 

0-6000 

0-0133 

0-0000 

0-0000 

0-5000 




0-5 

0-3101 

0-0176 

-0-0015 ' 

- 0-0010 

0-3086 




1-0 

0-1636 

0-0238 

-0-0014 

- 0-0001 

0-1587 




1-6 

0-0733 

0-0232 

-0-0003 

0-0025 

0-0668 




2-0 

0-0285 

0-0164 

0-0004 

0-0043 

0-0228 




2 -5 

0-0098 

0-0090 

00005 

0-0041 

0-0062 




3-0 

0-0031 

0-0041 

0-0003 

0-0028 

0-0013 




3-5 

0-0009 

0-0016 

0-0002 

0-0016 

0-0002 




4-0 

0-0003 

0-0006 

0-0001 

0-0007 

0-0000 

















362 Distribution of 'Student'’s' t from non-normal universes 

whereas the estimate of the actual probability 

P(\t\>t 0 ) = O'3447. 

If we consider the negative tail only, then F {t ^ — f 0 ) = 0-1832. So that whichever tail is used 
the conclusion that there is no reason to doubt the origin of the sample is obvious. 



Pig. 1. Showing the terms in the distribution 

p(t) = Pa(t) + A a }3 Aa («)-A 4 J3 A4 («) + for n' = n~l - 2 degrees of freedom. 

Explanation of symbols for all' figures: 

* - • - • PoW i 0 - 0 - 0 PaJI) ! 

* -*-A- -4c -4 Px ( t ). 



Pig. 2. Showing the terms in the distribution 

p(t) = p„(t) + A 3 p Aa (j) — A 4 p^(j) -|- for n' = n— 1 = 6 degrees of freedom. 


A. K. Gayen 


363 



Fig, 3. Showing the terras in the distribution 

p{t) = Pa(t) + \p\ 3 {t) — KPXfi 1 ) + for n' = n- 1 = 12 degrees of freedom. 


Fig, 4. Showing tho terms in the distribution 

P{t) = p 0 (t) + A a? Ao (t) - AdJ Ai (t) + hlPxiit) for n' = n- 1 = 24 degrees of freedom. 

7. Asymptotic character op the series por p{t) 

Pointing out some of the asymptotic properties up to order n~ l - possessed by the expression 
Pa(t) + '\p^(t) (being his earlier approximation for ‘Student’s’ t), Geary (1936) observed 
that for samples of moderate size, the probability 

{Poit) + X sPxM dt 

might have quite an extended range of applicability, provided that at least the lower fre¬ 
quency constants A 4 , A 6 and A e are small. He considered this to be a matter for experimental 
investigation. 




364 


Distribution of 'Student's't from non-normal universes 
The probability function 

p(t) = M)+KnM - Kvxff )+ 

obtained in (4-2) possesses similar asymptotic properties up to order nr 1 . For it represents 
the frequency density of ‘Student’s ’t for samples drawn from any parent population if the 
samples are so large that the higher order terms in n~ ir for r ^ 3 can be neglected. On the 
other hand, it gives the frequency density of ‘ Student’s ’ t for samples of any size drawn from 
the particular parent population (2-1), when the A terms other than in A 3 , A 4 and A| are 
negligible. From Geary’s (1947) asymptotic formulae for the cumulants of t for any universe 
(his results 2-18), it is apparent that these higher A terms can only occur in terms in nr* 
for M 3, so that it is not unlikely that for samples of moderate sizep(f) has a quite extended 
range of applicability, provided higher order cumulants A 6 , A e , etc., are small. 

The expression for q(t) in (5-3), being correct to nr*, will provide a closer approximation 
to the actual distribution of t than the other two expressions, for samples from any universe 
if the terms in n~ l and higher negative powers of n are negligibly small, as also for samples 
of any size for the parent population specified by (5-1) if A s ... A$ are small. 


Table 2. Showing the experimental determinations of the true probability of t with their standard 
errors for various sampled universes together with the corresponding probability obtained 
from the frequency function p(t) of (4-2) 



•Size 

of 

sample 

Sampled 
populations with 

Normal 

theory 

probability 

Experimental 
determinations of 
the probability 
with their s.E.’s 

Probability 
as estimated 
from p(t) 
of (4-2) 

A1 =/h 

^4 

E. S. Pearson 

5 

1 

-0-60 

mm 

0-044 ±0-009 

■» 

(1929) 

5 


M2 

-■ ■ 

0-038 + 0-009 



5 

■jnjlil 

4-07 


0-029 ±0-008 



5 


0-30 


0-044 ±0-009 

0-043 


5 

- 

0-73 

| 

0-042 ±0-009 . 

0-048 

H. L. Rietz 

5 


-0-03 

0-067 


0-071 

(1939) 

5 

0-03 

-0-55 

0-067 



A. N. K. Nair 

5* 

warn 

1 

0-040 

0-062 + 0-011 

0-072 

(1941) 

6t 

4-00 

■ 

0-030 

0-088 + 0-010 

0-088 


* Type III curve. f Exponential curve. 


Approximate probabilities correct to for a single tail will be obtained by applying to 
the normal theory value necessary corrections due to A 3 , A 4 , A| and A 5 , A 3 A 4 , A|. But'when 
both tails are used, corrections for A 4 and A§ only will furnish the same order of approximation 
since the contributions due to odd-order cumulants cancel each other, i.e. the use of p(t), 
which provides an approximation correct to n~ l only, will actually lead for a two-sided 
test to results correct to n~K This favours the application of our formula for p(t) to samples 
of moderate size from populations whose degree of departure from normality is not necessarily 
small. 


















A. K. Gayen 


365 


Results of investigations made by various writers support the conjecture that p(t) or 
q(t) may have a quite extended range of applicability. Table 2 shows the values of A? and A 4 
of the sampled populations, the size of the samples considered, the normal theory probability , 
P 0 (t 0 ), and the experimental determinations with their standard errors, of the approximate 
probability P(t 0 ) against their values obtained from p(t). The agreement is quite good, even 
in cases where samples of five or six only have been taken from markedly non-normal popula¬ 
tions. One of the two sampled populations of Nair is a very skew Type III Pearson curve, and 
the other is the exponential curve. These populations, as is well known, can hardly be 
, represented adequately by the third or fourth approximation to the law of error. 

The agreement between experimental and theoretical results, especially for the two 
populations considered by Nair, may be partly due to the fact that the approximation yielded 
by p{t) is, as we have noted, not only correct to nr 1 but also to , since both tails have been 
used for these cases. 

For certainvalues of A|and A 4 , estimates of P(t 0 ) at 1 1 0 j = 3, for samples of 5 and 6, obtained 
from formula (4-2), are shown in Tables 3 and 4. Numerical values of A! and A 4 have been 
seleoted to include a fairly wide class of non-normal populations met with in practice. In 
spite of the fact that there is a fair agreement between the tabulated results and the experi¬ 
mental determinations, it is obviously .unwise to assume that they necessarily furnish in all 
cases a satisfactory estimate of the true probability of t for such small samples. 

The forms of the two approximations, namely (4-2), correct to w 1 , and (5-3), to n~-, are 
different from those which may be correspondingly obtained from Geary’s (1947) asymptotic 
expansion to nr 2 (his result (2-24)). He has obtained that expression by using Charlier’s 
‘Differential Series’ with the normal theory t as the generating function, utilizing for the 
purpose his derived results for the first six cumulants of non-normal f. One of the advantages 
of Geary’s expression for t is that it takes into account some of the higher cumulants and 
higher powers of A 3 and A 4 . Accordingly it may have an extended range of applicability. 
But the fact that it agrees, to order n~ l , with M. S. Bartlett’s (1935) result suggests that the 
formula is not satisfactory, since the asymptotic character of the latter cannot be assumed. 
For Bartlett specified his parent population by the first three terms of the Edgeworth series 
which take no account of the term in A§, namely, lOA|0 (6) (o;)/6!. The estimates of true 
probabilities shown in Geary’s Table 2 cannot as such be regarded as satisfactory. 


8. Probability corrections dub to higher cumulants or the 

PARENT POPULATION 

For the effects of A 6 , A 3 A 4 , A? we consider as before the probability integrals of the corrective 
functions and denote them by P Xs (t 0 ), P AaAi (< 0 ) and P^{t Q ) respectively. We then have 


A,(*o) = - Px B (t)dt . 

J CO J ta 


1 + 


2(4n-3) f , t (2»»-2»+l) f ; 


(n-l) 


(n— l) 2 


/ & \l( ,i + 3 > 

40. rfi i 1 + ^——A 

(?i-l)(n-4)). f n + 3 \ l (n-l){n-2) \ / n +1 \ 

2 ’ J \ 10 ^( 271 )# j H 2 ’ / 


20^/(27r)# 




(2w 2 — 2n+ 1) 


40 n * 


t,(V4 


( 8 - 1 ) 


(8-1 bis) 



366 


Distribution of 4 Student's ’ t from non-normal universes 


Table 3. Showing comparative values of 

rt„ 


%' 


1- 


p(t)dt = 2[P 0 (4)-A i P / , 1 (i 0 ) + A 3 2 ^ 0 )] J 


near the 4 % point (approximately) of normal theory t for samples of 5 ( n' — degrees of 
freedom = 4) * 


V A! 
a\ 


0-20 

0-25 

0-50 

1-0 

1-5 

2-0 

4-0 


mm 








-1-5 


0-0527 

0-0539 

0-0598 





Hp 



0-0512 

0-0572 

0-0691 





Kg! 1 

0-0474 

HI 

0-0545 

0-0665 

0-0784 



■ 


0-0447 

0-0469 

0-0519 

0-0638 

0-0758 





0-0421 

0-0433 

0-0492 

0-0612 

0-0731 



SlBfllf 

0-0346 



0-0466 

0-0585 

0-0705 



1-5 


0-0308 

IH 

0-0439 

0-0659 

0-0678 




1 

0-0341 

0-0363 

0-0413 

0-0532 

0-0652 

0-0771 

0-1249 


■m 

0-0288 



0-0479 

0-0699 

00718 

0-119(1 


0-0187 

0-0235 

0-0247 


0-0426 

0-0646 

0-0605 

0-1143 



.0-0129 

0-0141 

0-0201 

0-0320 

0-0440 

0-0569 

0-1037 


Table 4. Showing comparative values of 



1 

p(t) dt — 2[P)(2 0 ) — A iPxftg) + Ajji^j^o)]) 


near the 3 % point ( approximately ) of the normal theory t for samples of 6 ip! = degrees 
of freedom = 5) 


\ A| 

0-00 

0-20 

0-26 

0-6 

1-0 

1-6 ' 

2-0 

4-0 

a\ 









-2-0 

0-0386 








-1-6 

0-0365 

0-0406 

0-0417 

00468 





-1-0 

0-0343 

0-0386 

00393 

0-0447 





-0-5 

0-0322 

0-0.364 

0-0374 

0-0426 

■ I 




0-0 

0-0301 

0-0343 

0-0363 


■ 

0-0613 

0-0717 


0-5 

0-0280 

0-0321 

0-0332 



0-0592 

0-0696 


1-0 

0-0259 

0-0300 

0-0311 


0-0467 

0-0571 

0-0674 


1-6 

0-0237 

0-0279 

0-0289 



0-0549 

0-0653 


2-0 

0-0216 

0-0268 

0-0268 



0-0528 

0-0632 

0-1048 

3-0 

0-0174 

0-0216 

0-0226 

0-0278 


0-0486 


■OMjB 

4-0 

0-0132 

0-0173 

0-0184 

0-0236 

■ 

0-0443 

0-0547 


6-0 

0-0047 

0-0088 

0-0099 

0-0151 

llljl 

0-0359 

0-0463 

■ 































A. K. Gayen 


367 


P»*JM = - f ^W*)* = f 

J CO v to 

- (O /,(—, *) - (».») 4(ht • 3 ) + <”»> J -(T ■ 2 ) - <”<■» 4.(^, i), 

where (m 34 ), (»«)> K 2 ) and Ki) are given by 


j(tor)n* J ' 


I (»—i).bI 

<"»> " | —288' J(«» 3 + 2S8»> + 182» + 51), 




Ns) = l 9flV(2ff)n* J ( 150w3 + 896 ^ + 869m+297), 

' n + *.2)l 


5A 


(^ 52 ) — 


( n ei) ~ 


(68m 2 + 311m+ 237), 


(25m+ 59), 


96^/(2tt) n* 
P\po) = = - J ( 2>aj (t)dt 


where 


{(»-i)^(V> 6 )i 


(»«) = 2592 vW)n* J ( 64 ^+952m 8 + 3578m 2 + 3011m+ 996), 

(^54) = 

; iti a j 

3 )) 

(252m 3 + 2821m 8 + 8564m + 7077), 


1 22 g v /( 2 7r) - 7 f} j (144m 4 + 1918m 8 + 701 lm 8 +7550m+ 2913), 


M 2 ^. 3)1 

(»oo> - \ 90A y(2„)„ t J 1 


(w 7a ) — 


ii A 7 _/|(541m 8 + 2086m+3049), 


/ 3 5 j b(^, 1 )\ 
_ \ 96V(2m)mi / 


(3m+13), 


with the usual notations of Complete and Incomplete B-functions. 

Biometrika 36 


( 8 - 2 ) 


(8-3) 


K S )4o(^. 2 ) — (m 81 )* l), (8‘4) 


( 8 * 6 ) 


24 



368 Distribution of ‘ Student’s ’ f /rom non-normal universes 

The additional formulae (8-1), (8-2) and (8-4) of this section enable us to deduce the pro- 
bability of t when the parent population can be represented by the fourth approximation. 
The corrected probability P{t 0 ), for a single tail, may be obtained from the formula 

P(h) = PM + 4A a (*o) - KP*M + AI^o) + A b Pa 6 (^) + hkPxM + A^P A |(« 0 ). (8.6) 

It should be noted that for the negative tail of the distribution P^ a {t 0 ) and P )s(f 0 ) are positive, 
whereas jP A (t 0 ) and P hAl (t 0 ) are negative, the reverse being the case with the other tail. 

For samples of 10 from such a population the probability at f 0 = — 2-262 (here | t a [ is the 
5 % point of the normal theory t), will be given by 

P(- 2-262) = 0-0250+ A 3 (0-0209)-A 4 (0-0015)+A§(0-0092) 

+ A b ( - 0-0017) + A 3 A 4 ( - 0-0111) + A|(0-0221). 

If A 3 = 1, A 4 = 1, then for a Pearson curve, A 5 = 0-25, and in that case the probability is 
found to be 0-0642, whereas the estimate from terms as far as A| only is 0-0536. Thus for 
a single tail they differ considerably from the normal theory probability 0-0250. 


Summary 

The theoretical distribution of ‘Student’s ’t in non-normal samples of any size has been 
derived with reference to the parent population specified by a number of terms of the 
Edgeworth series. It contains, in addition to the normal theory frequency function of t, 
corrective terms due to the cumulants A 3 , A 4 , A§ and A 6 , A 3 A 4 , A|. It is assumed that the 
values of the population A’s are known. 

If the population can be well represented by the third or the fourth approximation (in 
which cases next higher A’s and higher powers of A 3 and A 4 are small), then the corresponding 
t-distributions will be accurate for any size of sample, as far as similar order terms in A’s, 

The two expressions for t, namely, the one including the terms up to that in A| and the other 
up to that in A|, are asymptotic to order vr 1 and w - * respectively, and as such they afford 
closer approximations to the actual distribution of t for large samples from any universe, 
The probabilities calculated from the former will be the same as those from the latter, if both 
the tails of t are used. So it is not unlikely that for moderate size of sample, the former may 
have an extended range of applicability. The satisfactory agreement between the theoretical 
values and the experimental results of various writers appears to support this conjecture. 

Tail-area probabilities of the corrective terms for A 3 , A 4 and A§ have been tabulated. 
An example is given to illustrate the use of the tables. Diagrams showing the frequency 
curve of non-normal t for certain values of n have been constructed. 

The probability corrections due to higher cumulants of the parent population have also 
been considered for a representative value of n. 

I wish to acknowledge my indebtedness to Dr H. E. Daniels for Iris kind advice and 
criticism in the course of my investigations, also to Dr J. Wishart for suggesting a number 
of improvements to the paper, ’My thanks are due to Mr D. A. East for drawing the 
diagrams. 



A. K. Gayen 

REFERENCES 


369 


Bartlett, M. S. (1935). Proc. Camb. Phil. Soc. 31, 223. 

Bowley, A. L. (1928). P. Y. Edgeworth's Contributions to Mathematical Statistics. 

London: Royal Statistical Society. 

Cramer, H. (1928). Skand. AktuarTidskr. 11, 13, 141. 

Cramer, H. (1946). Mathematical Methods of Statistics. Princeton University Press. 
Edgeworth, F, Y. (1906). Trans. Camb. Phil. Soc. 20, 36, 113. 

Edgeworth, F. Y. (1900). J.R. Statist. Soc. 69, 497. 

Geary, R. C. (1936). J.R. Statist. Soc. Suppl. 3, 178. 

Geary, R. C. (1947). Biometrika, 34, 209. 

Naxr, A. N. K. (1941). Sankhyd, 5, 393. 

Neyman, J. & Pearson, E. S. (1928). Biometrika, 20 a, 175, 263. 

Pearson, E. S. (1928, 1929). Biometrika, 20a, 356 and 21, 259. 

Rietz, H. L. (1939). Ann. Math. Statist. 10, 266. 


24-2 



[ 370 ] 


THE COMBINATION OF PROBABILITIES ARISING FROM DATA 
IN DISCRETE DISTRIBUTIONS 

By H. 0. LANCASTER, Rockefeller Fellow in Medicine 
Introductory 

1. Two common metb ods of combining probabilities from different experiments make use 
of the additive properties of y 2 . Thus the results of the various experiments may each be 
expressed as a standardized normal deviate or as its square. In these cases the probabilities 
are readily combined by the summation of y 2 . In other cases there may be obtained the 
probability, on the null hypothesis, that a result as divergent as the observed or one more 
divergent would occur. Such probabilities may be combined in a simple way by the use of 
the transformation y 2 = — 21og e P. No difficulties are met in data from continuous populations 
nor in discrete populations, where many different observations are possible, such as occur 
with large samples. However, with discrete populations, as, for example, the binomial with 
low index or the fourfold table with small numbers, biases arise which may diminish the 
.power of the tests. This is most easily seen in the case of the binomial if independent values 
of y 2 , that is, y 2 corrected for continuity, are summed. As these biases have not previously 
reoeived much attention, we examine the problem in some detail. We suggest that, for the 
y 2 with one degree of freedom, only the crude y 2 is suitable for summation. No obvious 
modification for the probability integral transformation is available, but we suggest the use 
of the ‘mean value y 2 ’ or an approximation to it, the ‘median value y 2 ’. The former, when 
the null hypothesis is true, has an expectation rigorously equal to the theoretical value 
of 2 and a variance slightly less than the theoretical value of 4. 

We discuss, first, the case of the binomial with low index, where an enumeration of all 
possible events and their relative frequencies of occurrence under the null or other hypothesis 
is practicable. Then we pass on to the more difficult case of the fourfold tables where the 
number of possible events is greatly increased, even with the simplified conditions which 
we have selected. 

The binomial 

2. Notation. In the discussion of the binomial, we shall take p and q to be the probability 

of success or failure of an individual observation as specified by the null hypothesis, n the 
number of trials, m the observed number of successes, and p' and q' the corresponding 
probabilities in the actual population which is being sampled. We shall use P in the usual 
sense of the calculated probability of the observation and all observations more extreme, and 
P' to be the probability of these more extreme observations alone, so that (P — P’) is the 
probability (i.e. the relative frequency of. occurrence) of the observed result. Occasionally, 
as in (1), it will be necessary to use P as a continuous variable. In connexion with the pro¬ 
bability integral transformation, we define the mean value y 2 , (y 2 ,), and the median value 
X i > (X'm), as follows: p 

y* m = J^(-21og e P)dPj(P-P>) 

= 2 — 2{P log e P — P'.log e P'}I(P — P'), 

Xm = ~ 21og e ^{P + P'}, ifP'=t=0,i 
?= 2 — 2 log 6 P, ifP' = 0.J 


( 1 ) 

( 2 ) 



H. 0. Lancaster 


371 


3 , A note on the correction for continuity in thecase of the binomial distribution. Yates (1934) 
made a definite, although indirect, statement on the use of his continuity correction for the 
combination of probabilities from different experiments in the following words: 

‘p(y') [i.e. our P(y c )] gives 0-1427, an excellent approximation, whereas P(y) gives 0-0612, 
which, though not in itself attaining significance, is less than half its true value; this would 
he exceedingly misleading if a number of such probabilities from different classes of experi¬ 
ment were to be combined,’ D. Mainland (1948) takes this to infer that y| > and not the crude 
f, is to be summed, a fallacious practice already rejected by Cochran (1942). We may 
note that 

-E(Xc) = E(\m-np\-WI (npq) 

= B{m — np) 2 l(npq) — E \ m — np | l{npq) + l/(4»pg) 

= 1 + ll(inpq) - (mean deviation)/(variance). (3) 

If np is integral, the expectation is less than that given above by a quantity equal to the 
frequency of the modal term divided by four times the variance. We have always here 
diminished | m - np | by This will result in a change of sign if | m - np | is less than but, 
in general, it is obvious that only a trivial and non-systematio difference will be made to 
the expectation by the alternative method of not diminishing \m—np\ by \ if it be less 
than i, and the method used here has rendered the algebra much easier. If neither n, np nor 
nqp be small, this bias due to the use of the continuity correction is trivial, but then the 
continuity correction would be unnecessary; in those cases for which the correction is said 
to be essential, the mean deviation may be of the same order of magnitude as the variance 
or, at any rate, nob a small fraction of it, and so bias will result. We note, for example, that 
in the binomial + if the expectation of y 2 is 0-25 if the null hypothesis be true. Many will 
regard this as an unnecessarily extreme case, but it is not as extreme as Mainland’s example 
(on bis p..54) which we quote in § 9 below. 

4. The power of the methods of combination of probabilities. In Table 1A we have attempted 
to give some idea of the power of the tests, in the sense of Neyman & Pearson (1933), for 
ahinomial population with index of 5. We note first of all that no single experiment can give 
a result significant at the 5 % level, so that we are led to some method of combination of 
successive experiments, and we have compared the results of (a) simple pooling of the 
successes and failures, (b) summation of the crude y 2 , (c) summation of the mean value y 2 , 
(d) summation of the — 2 log c P, (e) summation of y 2 , which we find are decreasingly powerful 
in that order.* We have taken p = 0-5 as the null hypothesis and have enumerated the 
possible events in populations with various values of p' and have calculated the relative 
frequency of each event. We have then been able to calculate the frequency with which 
we may expect results significant at the 5 % level combining two, three or four experiments; 
in the rows indicated by the symbol oo, we have shown the limits to which results tend if 
very many samples are combined. Thus, we have been able to show that, if the null hypothesis 
were true in the population sampled, by combining two experiments it would be rejected in 
4-3 % of oases using the summation of the crude y 2 , whereas if thep' of the population sampled 
were 0-26 or 0-75, then it would he rejected in some 25-2 % of eases and so on. If a y 2 has an 
expectation below its appropriate theoretical value of either 1 or 2 , respectively, then we 

* In all cases we have used a two-sided test, e.g. where the null hypothesis gives the binomial (\ + |) s , 
we have pooled together results with 0 or 5, 1 or 4 and 2 or 3 successes. 



372 Combination of probabilities 

may assume that in an indefinitely large number of experiments the value of y a obtained is 
below that of the 5 % significance level in practically every set of experiments combined, 
and so we have assumed that no significant results will be obtained in these cases. However, 
if the expectation is above the theoretical value, then we have assumed that a summation 
of the y 2 from an indefinitely large number of experiments combined will always yield a 
significant result. Thus we find that, if p' = 0-33, the expected value of (— 21og e P) is 1-480, 
and so, if the experiment were to be repeated a thousand times we should have a y 2 of 
approximately 1480 with 2000 degrees of freedom and should have no ground for rejecting 
the null hypothesis. In this example we have scored the rejection rate as zero. If, on the 
other hand, the expectation of y 2 were above 2 , as in the case where p' = 0 - 2 , we should expect 
every indefinitely long series of experiments to lead to the rejection of the null hypothesis. 
As was to he anticipated, simple pooling was found to be the most powerful method of 
combination. The crude y a and y 2 , were equally powerful in this particular group of popula¬ 
tions, whereas y 2 and the probability integral transformation were much less powerful and, 
in fact, were unable to detect considerable departures from the null hypothesis even with 
a large number of experiments available, Somewhat similar findings occur with samples 
from the populations {p‘ + g ') 10 when the'null hypothesis specifies that p = 0-5, as can he 
seen from Table IB. In-these discrete populations we can only give arithmetical results in 
specific cases, since the use of approximate methods would remove the effect of the dis¬ 
creteness which is what we desire to study. We have spent some time on the binomial because 
the problems here are more easily studied than in the case of the fourfold table where the 
enumeration of cases becomes impracticable, and because the binomial is the limiting case 
of the series involved in the fourfold table, when the members of one row become indefinitely 
large. 

5. The probability integral transformation, y 2 = — 21og e P. We discuss now in greater 
detail the transformation, y 2 = - 21og e P, as introduced by R. A. Fisher (1932). A similar 
transformation was dealt with by Karl Pearson (1933) and later E. S. Pearson (1938). None 
of these authors appears to have considered the effects of discontinuity, but the recent 
increasing use of very small samples renders such an inquiry now necessary. This transforma¬ 
tion gives unbiased results when based on continuous distributions, but it will he shown 
that it must he used with caution in the case of discrete distributions. Under the null hypo¬ 
thesis, all values of P are considered to be equally likely, so that, if F is the distribution 
function of P, 

dF = dP. (4) 

Integrating, this gives 

F = P ( 0 <P< 1 ), (5) 

since the probability of obtaining P = 0 is 0 . Then/(P), the frequency function, is given by 

/(P) = constant = 1 ( 0 <P^ 1 ). ( 6 ) 

After the transformation, y = — 2 log e P, we find that y has a frequency function 

f(y) = £exp(—M (?) 

so that y is distributed as y 2 with 2 degrees of freedom. Thus, any number of probabilities, 
P , may be converted by means of this transformation to y 2 for 2 degrees of freedom, and, 
using the additive properties of the y 2 distribution, the y 2 may he summed together with the 
degrees of freedom in the usual manner. It is necessary, however, to note that P must he 



rH tH CO O 
CDrHCO© 
O CO rH CO O 
© © © 


CO CM CD o 
ID CO CO O 
O tH O o 
CO CD CO o 


HMC^O 
hi T* <7 © 
o tH CO h 6 
N tJ( CD O 


f rH O Hi O 
© tp CO O 

*te> oiino 

° HN1#0 


•* 10^0 
■"* 03 t> O 


AA/IA 

WWH^ 


l> © © © 
00 ID CO o 
HI © iD © 
CCW^O 


ID CO H O 

1> ip CO © 

ow 66 

H H r( O 


CO >0 H © 

co co H o 
»D rH -4 © 
O 



00 eq 
00 03 O 
H Ip ID # 
03 cq id <N 
rH rH 

A A A A 

oa 

?<w ><;? 
M 



A A A A 

N N N V 

RWH^l 


CD H ID O 
CfilOriO 


ID ID H © 

© cm <m 9 

CO cq 03 o 
CO H ^ © 


ID CO O O 

(M eq co o 


iD ID CO © 

H H I> © 

CD ID t> 6 
hhhQ 


© <M <M O 
03 CD. 03 © 

A © A © 

rH rlHO 


<M © 03 00 ID 
ID CO i-H CO rH 
,H ip 03 Ip ip 
CM cti CO ID iD 


N O IO 1ft H 

oft H co co co 
cp rH cp © os 

rH fiq 05 HF CO 


i—I © cO GO 00 

© © cq w © 
© © rH H CO 
rH (M (M CO CO 


V © © 00 © © 
s 5 ID HI © rH 00 
9 CO 1 > 9 l> 
g © rH rH Ctt (M 


ri tH O CO r-i , 

co h oo co cq 
t> rt H © >p 

6 H H <M (Ml 


© H <M © 
COM t^O 
© Hi 03 CM © 
W © 


t> »D »D 
i> co cq 

© © © © 


© l> ID © 
© rH HI © 
© iD iD © 
© 


©Ot-tfOl- 
'o o Hi co i—i 

*0 H H lM H 
Oh hMM 


a>8.a 

|.|-a. 
ti |-g 

o-p 0 o 

to ■P ^ 

Uz% 

S ' 3 hZ 
S 0 
s « 


, 3.0 S f 
« g| J 

3 |e- 8 . 
5 ® | s 
jj § Q © 


© CM rH O 

rH ID rH © 

© <M CO H ID 


rH CM CO rH 8 



CM CO rH 8 



(M CO H« g 


© © © © © 
JO © © © 03 

H O 03 O 00 
O rH © CM rH 


& I 
1 II 

«o_M « PIUS'S 


E indicates the expectation. f xl and x£ give the same results as the crude x 2 for the combination of 2 or 3 experiments. 




CO 
ffi <D 
a) k 
w co 


$ *0 

$ 3 $ 6 

o q o , 

g o 

“ o O Q 
o cq cO rc 
*H I I H 

• u)h q 

0 > H 03 43 

* * ., 0 
iM W 5 Ci 


■H IQ 
03 H 
9 CO # 
© £■ H 

AAA 

*?sg 

WW^i 


OD <N 
CO Ci 

■*H 1 C # 
Ci <?3 

AAA 

/'H. 

M 

N W V 

wW ^ 


i—H >o 
Ci H 

9 co # 

10 1 > M 

AAA 

pP 

ww^i 


.3 


rtWWO 
TjiO wo 
TffHOO 
<N CD 00 O 


CD r* © 
OHO 

»o f- 6 

<M CO O 


1 > ^ o 

IflHO 

xH *> 6 
CO CO o 


0^0 
00 Co O 
o GSJ 6 
ifj <oo 


lC rH OD O 
HH(MO 
^ lo 


t- co 
t- 05 

66 o 


a cc 
CD 05 
HOO 


<M C O 

0^9 

6 ^ 6 


100VOU5UJ 
(Mio (M M CO 

co o h d5 
03 CO ^ IQ 


h 

0 <Q 

.CO to 

co 6 

6 

•o.ni results 

10-44 

29-74 

43-18 

100-00 

7-65 

14-05 

100-00 

12-46 

14-05 

100-00 

03 t> © 

i> >o o 
co i> 6 
oq eq © 

H 

various x % 

1- 340 

2 - 000 

2- 513 

3- 407 
3-304 

u 

* 0 (£ 

^ © 

6 

age of signifu 
4-80 
12-72 
17-72 
100-00 

O H 

1C 'f 
<WMO 

CD rH 

9 rH 
^ CO O 

H H O 

H co © 

H H O 

H H © 

H 

03 

•Ki (M O K5 H H 

V-, id CD CO rH CO 

0 CO CO t> 10 -rH 

^ © H H 03 03 

0-45 or 
0-55 

1 

S C(0 00 ^ O 

03 

^ eq 6 1 > 6 
o 

• h) ' H 

CO t-> 

rH ^ 

h A o 

^ l> 

9 -f 
CIHO 

c£) © O 

^99 

CD co 6 
o 

H 

gs O © CD CO 03 
.« ID 05 © C3 1C 
fiq {DC^HO 
^ OHHCS W 
& 


rtooo^ 
00 O OS O CO 
WOMO© 

6 A A es* A 


I 

o 

o 

<d 

I 

02 


rH (N CO 8 


J* 


o 

<S 

M 


03 CO 0 


03 co 8 


bo 

c 

I—( 

03 

I 


NM J 


-S 

I 


bo 

o 

rS 1 
1 11 

»3*<¥$l 


375 


H. 0. Lancaster 

assumed, to come from a rectangular population with end-points 0 and 1. Further, in con¬ 
tinuous populations the transforms of P and (1 — P) both have a finite expectation. This is 
not so with discrete populations as can easily be seen in the case of the binomial, since there 
is a finite probability of obtaining a result 1-P = 0,when -21og e (l-P) = oo. If, moreover, 
a two-sided comparison is made so that we always sum the probabilities from the tail in which 
the observation lies, a further difficulty arises. In order to define which tail to use, we must 
fix some dividing point and sum from the lower tail if the observation falls below this point 
and otherwise sum from the upper tail. In the case of a continuous distributon the most 
natural point to take is the median of the sampling distribution and then we may take for P 
in the transformation, double the probability integral calculated from the appropriate tail. 
This was the procedure suggested by Sukhatme (1935). Mainland (1948) has suggested using 
such a method when dealing with discrete distributions, but without some adjustment 
it may lead to values of P greater than unity. It is worth studying some examples to 
appreciate the difficulties more fully. 

6 . Examples of some difficulties arising in the transformation , y 2 = — 21og e P. (a) The 
expectation of x 2 is always below 2 for discrete populations when the null hypothesis is true, 
as we now illustrate in the case of the binomial (J -1) 4 . We may tabulate the possible events, 
their relative frequency and the corresponding values of P and (- 2 log,, P) for a one-sided 
comparison: 


Successes 

(m) 

Relative 
frequency 
of m 

Cumulative 

probability 

(P) 

-21og e P 

4 

0-0625 

1-0000 

0 

3 

0-2500 

0-9376 

0-12908 

2 

0-3750 

0-6875 

0-74939 

1 

0-2500 

0-3126 

2-32630 

0 

0-0625 

0-0626 

6-64618 


We conclude that the expectation of x 2 > he. of — 2log e P, is 1-241, when the null hypothesis, 
that p = 0-5, is true. 

For a two-sided comparison , we obtain the results set out below: 


Successes 

(m) 

Probability 
of { m ) 

Cumulative 

probability 

IP) 

~21og e P 

- 0 or 4 

0-126 

0-126 

4-16888 

1 or 3 

0-600 

0-626 

0-94007 

2 

0-376 

1-000 

0-00000 


Here, the expectation of y 2 is 0-990 instead of the theoretical value of 2. It will be noted that 
in this case the division point is at m = 2, and the procedure adopted is equivalent to summing 
from the lower tail on-half the occasions when m = 2 and from the upper tail in the other 
half and, then, talcing P as double this tail sum. 







376 Combination of probabilities 

(6) When, however, the distribution, is asymmetric, a further modification is needed, 
Consider the case of the binomial (f+ ±) B ; here the mean lies between m = 1 and m == 2, 
Col. 3 of the table shows the cumulated probabilities from either tail. In the final column, 
instead of multiplying by 2, we have adjusted P by dividing by the proportion of observations 
in the respective tail. This avoids the possibility of having a probability, P, more than unity, 


Successes 

{in) 

Probability 
of in 

Cumulative 
probability, P 
from one tail 

Adjusted 

P 

0 

■ 0-131687 

0-131687 

0-28570 

1 

0-329218 

0-460905 

1-00000 

2 

0-329218 

0-639094 

1-00000 

3 

0-164809 

0-209876 

0-38931 

4 

0-0411S2 

0-046267 

0-08397 

6 

0-004115 

0-004115 

0-00763 


(c) As another illustration, take the binomial (f + i) B . Here the question arises—how 
should the observations corresponding to a value m = 2 (the mean) be apportioned? Fol¬ 
lowing the method of the preceding example, we should get different adjusted values for 
P and therefore different expectations for - 2log c P, according to the tail to which we 
assign samples with m = 2. It would, of course, be possible, but troublesome in practioe, 
to obtain a median division by using some random method of allocating samples having 
m = 2, to the lower tail with a probability of 0-14884 and to the upper tail with a pro¬ 
bability of 0-18038. 


Successes 

(to) 

1 

Probability 

of TO 

Cumulative 
probability (P) 
summed from 
appropriate tail 

0 

0-08779 

0-08779 

1 

0-26337 

0-36116 

2 

0-32922 

? 

3 

0-21948 

0-31962 

4 

0-08231 

0-10014 

6 

0-01640 

0-01783 

6 

0-00137 

0-00137 


7. The mean value y 2 . To illustrate the solution proposed, suppose we are dealing with 
samples of 5, that the null hypothesis is that p = | and that we are using a one-sided test, 
i.e. only concerned with the alternative that j>' > The follow ing table shows the relevant 
quantities. On the null hypothesis, y = — 2 log B P can assume the six discrete values given in 
col. 4 with the corresponding relative frequencies given in col, 2. In Pig. 1 the latter have 
been plotted as ordinates against the former as abscissae. If P were a continuous variable, 
then y = -21og e P would have the exponential frequency function/(y) of equation (7); this 
curve also is shown in the diagram. The mean of the frequency curve is at 2. The mean of the 




H. 0. Lancaster 377 

discrete distribution (i.e. the sum of the products of the corresponding numbers in cols. 2 
and 4 of the table) is 1-314. There is clearly a considerable bias. 


Case of binomial (| + i) B ; one-sided test 


i 

Relative 

frequency 

Pi-P t+ 1 

Cumulative 

probability 

P t 

Vi = -2 log, P, 

Vi 

- 2 log e 1(P< + Pj +1 ) 

(i) 

(2) 

(3) 

(4) 

(6) 

(6) 


0-131687 

1-000000 

0-0000 

0-1379 

0-1362 

i 

0-329218 

0-868313 

0-2824 

0-7213 

0-7028 

2 

0-329218 

0-539095 

1-2357 

2-0329 

1-9644 

3 

0-164609 

0-209877 

3-1225 

4-2788 

4-1181 

4 

0-041163 

0-046208 

6-1903 

7-7108 

7-4026 

5 

0-004116 

0-004116 

10-9862 

12-9862 

12-9862 


Expectation Variance 


X 2 with 2 d.jp. (theoretical) 

Vi 

Vi 

—2 iog e a Pi +-Pf +1 ) 


2-000 

4-000 

1-314 

2-482 

2-000 

3-689 

1-932 

3-443 



Eig. 1. (a) Frequency curve f(y) = and (6) probabilities for the binomial (f + $) 6 . 


In general, suppose that there are possible s successive values of — 2log e P and that we 
denote these by 

2 /i = — 2 log e P i (i = 0,1,2,...,s-l). (8) 

Call P 0 = 1, ?/ 0 = 0, P 8 = 0 and y s = go. Then 

fVi +1 

7^-Pm = f{y)dy. (9) 

J Vi 

Thus the bias in the expectation of arises because the frequency which in the continuous 







378 Combination of probabilities 

distribution is spread between y i and y i+1 , is concentrated in the discrete distribution at the 
lower end of the interval. To correct this, we propose to replace y i in the test by 


'5/1+1 / 

ft- yf(y)dy 

JVl I > 


5/1+1 


f(y) dy 


( 10 ) 


= P (- 2 log, P) dPjiPi - P i+1 ) = 2 - 2{P { log e I\ - P i+1 log, P i+: }j (P { - P ;+1 ). (11) 

JPi+i 


It will be seen from equation (10) that we are replacing y i = — 2 log e P i by the mean value 
of y in the interval y t to y i+1 , as given by the frequency function f(y) appropriate for the 
continuous variable case. By doing this we ensure that the expectation of y i is 2, for 


rn)= sV<--P««)ft -'s 

i= 0 i= 0 


-21o g e P)dp\ = 


Pi+1 


( —21og e P)dP = 2. (12) 


Further, the variance of y K will approach 4 from below, as the number of groups is increased. 
To show this, we note that the variance of the frequency function, i.e. of the distribution 
of x 2 wifi 1 2 degrees of freedom, may be split up as follows: 

[7(2/) (y - 2 ) 2 dy = *£ ( \ m 'm (y - 2) 2 4 

J 0 i==0 U Vi I 

s-1 ( fi/i+1 fS/i+1 1 

= E f(y)(yi-2) 2 dy+ f(y)(y-yi) 2 dy\ ■ 

(=oUi/i J vt I 

= E 1 (Pi - P i+ i) (Vi ~ 2) 2 + *2 ( 77 ( 2 /) (2/ - Vi) 2 4- (13) 

i=Q i-0 Ul/( / 

The first expression on the right-hand side of (13) is the variance of y { , which therefore falls 
short of the variance of i.e. of 4, by the seoond term on the right-hand side of (13). This 
latter expression depends on the width of intervals of y imposed by the null hypothesis and 
will usually be small, unless there are very few groups. 

We see from (11) that, in the event of the observation being the most extreme possible, 

ft-i = 2-21og e P s _ 1 , (14) 

since P a = 0. 

If we write for y it P for P € and P' for P i+V we have the form of the mean value given 
at the beginning of this paper in equation (1). The distribution of xh> or y it is of course still 
discrete, hut since it has an expectation of 2 and a variance slightly under 4, it may be ex¬ 
pected that if k independent samples are combined by summing the values of Xm> the resulting 
statistic will be distributed approximately as x 2 with 2 It degrees of freedom. The bias involved 
in summing — 21og s P has, at any rate, been eliminated. 

In order to avoid the tedious calculations necessary for Xm> we have defined 

H “ -21og e i(P i + P i+1 ) = _2log e i(P + P') (16) 

for all values of P', except when P' = 0. In this case, Xm has the simple form given by 
equation (14), and we assign this value also to Xm- This slightly reduces the bias of Xm> which 
has an expectation a little below 2. It is found, for the ordinary sets of probabilities which 
may arise in practice, that xl% and x'A approximate closely. Their expectations for a number of 
cases are given in Tables 1A and IB, and are seen to be approximately equal. For the example 
considered at the beginning of the present section, the individnal values of Xm — Vi and 
Xm - ~2log e ^(Pi + P i+1 ) are shown in ools. (6) and (6) of the table on p. 377; expectations 
and variances are compared below the table. 



379 


H. 0. Lancaster 
The fourfold table 

8. In considering the problem for a fourfold table,, we shall illustrate the position on the 
case where it is supposed that samples of ten are drawn randomly from two populations in 
which the chances of an individual possessing a certain character are p — (1 — r/) and 
p' = (1 — q'), respectively. The sample results will therefore he recorded as shown below. 
On the assumption that p = p', and within any set of samples for which a+b — constant, 
a follows a distribution of hypergeometric type with 

E[a) = £( a + b), Variance of a = [a+b) (20-a-£>)/76. 



No. with 
character 

No. without 

Totals 

1st sample 

a 

10 —a 

10 

2nd sample 

b 

10-6 

10 

Totals 

4 ~b 

20 — O—6 

20 


The definitions of the quantities P, P', xh, Xm given in § 2 for the binomial can therefore be 
extended to apply to the hypergeometric. Further, for the crude y a and xh we have 

X 2 = 19(a—6) 2 /{(a+6) (20 —a —&)}, 

Xl~20(\a-b\- l)»/{(a+ b) (20 - a - b)}. 

To compute - 2 log e P we calculate the sum of the hypergeomtric terms in the usual way, 
assuming a+b fixed, and make a two-tailed comparison. We note that under the null 
hypothesis, p = p', 

E(X 2 ) = 1,' (17) 

E(xl) = 20{1 — (mean deviation — £)/(variance)}/19. (18) 

lip+p', the expected frequencies of a and 6 will be given by the terms of the product of the 
two binomials, [p + q) w and (p/ +q') 10 . Thus 

P{a,b\p,p') = w O a p a q w ~ a x w GbP'W 10 - b - . (19) 

Using (19) we can compute, for given p and p', the probability or relative frequency of each 
fourfold table that can arise and hence obtain the expectations of y 2 , xh etc. Results so 
obtained are given in Table 2. It will be seen that the expectation of xh nnd a l 80 that of the 
probability integral transformation (- 2 log e P), is always less than the theoretical value, 
when the null hypothesis is true. The advantage of using y rather than y 0 in repeated sampling 
was discussed from a rather different angle in connexion with the fourfold table by E. S. Pear- 
son (1947, pp. 151-6). Both the expectations of y 2 and -2log e P are also less than the 
theoretical value when there is some moderate departure from the null hypothesis. The crude 
y a as defined by (16) and the xfn both have an expectation of the theoretical magnitude when 
the null hypothesis is true and an expectation above the theoretical, 1 or 2 respectively, when 
it is false. In the case of the binomial, we were able to compare the effect of using various 
methods of combination by an enumeration of cases and a calculation of the probabilities 
of obtaining a result significant at the 5 % level, when 2, 3, 4 and a very large number of 





380 


Combination of probabilities 


Table 2. A table to show the expectations of the various x 2 calculated from 
the fourfold tables as formed by the methods of this article 


Null 

hypothesis 

Value 
of p 

Value 
of p' 


Expectations of 


- xl 

Crude x a 

- 2 log, P 

v2 

rt-m 

True 

0-6 

0-6 

0-482 

1-000 

1-040 

2-000 


0-45 

0-46 

0-480 

1-000 

1-036 

2-000 


0-40 

0-40 

0-471 

1-000 

1-021 

2-000 


0-33 

0-33 

0-448 

1-000 

0-984 

2-000 


0-25 

0-25 

0-387 

1-000 

0-904 

2-000 

False 

0-65 

0-25 

1-716 

2-738 

2-989 

4-292 


0-50 

0'25 

1-326 

2-229 

2-408 

3-631 


0-45 

0-25 

1-009 

1-806 

1-900 

3-080 


0-60 

0-40 

1-009 

1-761 

1-004 

3-007 


0-66 

0-45 

0-612 

1-188 

1-267 

2-266 


experiments are combined. In the case of the fourfold table the computational work would 
be prohibitive, so we have calculated a number h such that k x JS(x z ) is greater than the 5 % 
point of the tabulated x 2 f° r the appropriate number of degrees of freedom. In other words 


Table 3 


Value of p 

Value of p' 

Values of k for different methods of combination 

Crude 

— 21og e P 

Mean value x a 

XS 

0-66 

0-26 

3 

.14 

3 

14 

Ha ■ 

0-26 

5 

100 

6 

60 

0-46 

0-26 

11 

* 

12 

10,000 

0-60 

0-40 

13 

* 

13 

10,000 

0-66 

0-46 

200 (approx.) 

* 

300 (approx.) 

* 


* In these cases the tost cannot be said, to have power. 


we are calculating bow many experiments we shall have to combine to give a power to our 
test of 50 %, assuming that in a number of experiments added together in this way there is 
a probability of 0-6 of obtaining a total greater than its expectation. The results are tabulated 
in Table 3. Thus, for example, whenp = 0-55, p' = 0-25, we obtain from Table 2: 


k 

kxE(-21og e P) 

= kx 2-989 

Degrees of 
freedom (2 h) 

6 % point of x 2 
for 2k degrees 
of freedom 

13 

38-867 


38-886 

14 

41-840 


41-337 











381 


H. 0. Lancaster 

Thus, we conclude that there is not a 50 % chance of establishing significance at the 5 % 
level, using — 2 log e P , until fourteen samples are available for combination. On the other 
hand, for these same values of p andp', using crude y 2 and combining three samples we find 
that 3 x F(y 2 ) = 3 x 2-738 = 8-214, and is greater than the 5 % significance level of y 2 for 
3 degrees of freedom, namely, 7-815. 

It is to be noted that the crude y 2 is slightly more powerful than the mean value y 2 „ but 
that both are very much superior to y 2 or to the y 2 obtained from the usual form of the pro¬ 
bability integral transformation. Where the expectation of y 2 in the fourfold tables is below 
the theoretical value, we have assigned no value to k since it is clear that an indefinite 
repetition of the experiment would not add to the power of the test. 

9. An instructive example. The following fourfold table is taken from Mainland (1948, 
p, 54). The animals are from his ‘group A’. 


II 

Died 

Survived 

Total 

Treated animals 

0 

12 

12 

Control animals 

2 

10 

12 

Total 

2 

22 

24 


We find that in the set of samples with these margins, the exact probability that a = 0 on the 
null hypothesis is ll/46or 0-2391304, and the corresponding value for y„ is 6/11 = 0-5455. For 
all possible tables with this set of marginal totals, J5(y 2 ) is only 0-2609 if the null hypothesis 
be true. Mainland, however, comments thus: ‘Again it appears that the treatment tends to 
reduce mortality, but in neither group, with P = 0-025 as the standard, is the difference 
significant. Group A, n = 12, P - 0-2391.’ On the next page he goes on to use this value of 
y 2 and a similar one from group B to obtain a y 2 for two degrees of freedom. We may note 
that the observation recorded is the furthest possible from the null hypothesis, and yet, 
were we to obtain it repeatedly, we should soon obtain a significantly low value of y 2 . Such 
usages cannot be defended. In the example above, — 2 log e P has an expectation of only 
0-9694 for the two-tailed test in place of the theoretical 2. However, w r e could calculate 
a mean value y^ which will have an expectation of exactly 2 on the null hypothesis. 

10. An advantage of the probability integral transformation. We have seen that the pro 
bability integral transformation, using yh or y^ of equations (1) and (2) has approximately 
equal power with the crude y 2 , when used in the summation of two way tests from the 
binomial distribution and fourfold tables; it has, however, the additional advantage that it 
can also be used for the combination of one-way tests and thus derive additional power in 
those cases where the departures from the null hypothesis we are looking out for are in the 
one direction. We can see that this is equivalent to the addition of a quantity ( — 21og 6 |), 
or 1-39 approximately, to the yf m or yj 2 of every experiment which gives a probability 
near zero. Of course, similar transformations could be found whereby the probability from 
an experiment can be transformed to the scale of y 2 for any number of degrees of freedom, but 
the distributions of y 2 for 1 and 2 degrees have obvious advantages. First, the data are often 
already available in the scale of y or of y 2 for 1 degree of freedom; secondly, the simplicity 




382 


Combination of probabilities 

of the probability integral transformation is a very real advantage. Further, the suggested 
mean value f m allows significance tests to be carried out and interpreted when the summed 
Xi is low. Such tests cannot be carried out with the ordinary y 2 = - 2 log 6 P transformation, 
since it has an expectation below the theoretical value, even when the null hypothesis is true. 


Summary 

11. Some numerical investigations have been carried out in the case of the binomial 
distribution and for the fourfold table. It is shown for the combination of significance tests 
that the usual probability integral transformation test loses much information and a slight 
modification is suggested. It appears evident that there is no justification for the use of 
y 2 in the combination of experiments. For combination of two-tailed significance tests, the 
crude y 2 is the next most powerful test after simple pooling, and it can be readily used in 
the combination of fourfold tables where pooling may not be possible. When the use of 
crude y 2 is not practicable, it is suggested that the mean-value form of the probability integral 
transformation xli> defined in equation (1), should be used. A simple approximation to 
y 2 „ is derived and given in equation (2); this we have termed the median value y 2 , y^. For 
the combination of probability tests when it is desirable to pay regard to the direction of 
the observed deviation, either the mean or median value probability integral transformation 
is superior to the crude y 2 . 

This work was carried out while the author was a Rockefeller Fellow in Medicine. He would 
like to thank Prof. A. Bradford Hill of the London School of Hygiene for the facilities of his 
department, Dr J, 0. Irwin and Mr P. Armitage for help in clearing up doubtful points, 
and to Prof. E. S, Pearson for advice and help in redrafting the article. 


REFERENCES 1 1 

Coohban, W. G. (1942). The y 2 correction for continuity. Iowa St. Coll. J. Sci, 16, 421, 

. Fisher, R. A. (1932). Statistical Methods for Research Workers, 5th ed. Edinburgh: Oliver and Boyd. 

Mainxand, D. (1948). Statistical methods in medical research. I. Qualitative statistics (enumeration 
data). Ganad. J. Res. 26, 1. 

Neyman, J. & Pearson, E. S. (-1933). On the problem of the most efficient tests of statistical hypo¬ 
thesis. Philos. Trans. A, 231, 289, 

Pearson, E. S. (1938). The probability integral transformation for testing goodness of fit and com¬ 
bining independent tests of significance. Biometrika, 30, 134. 

Pearson, E. S. (1947). The choice of statistical tests illustrated on the interpretation of data classed 
in a 2 x 2 table. Biometrika, 34, 139, 

Pearson, K. (1933). On a method of determining whether a sample of size n supposed to have been 
drawn from a parent population having a known probability integral has probably been drawn at 
random. Biometrika, 25, 379. 

Sukhatme, P. V. (1935), Proc. Ind. Acad. Sci. 2, 684. 

Yates, F, (1934). Contingency tables involving small numbers and the y 2 test. J. Roy. Statist. Soc. 
Suppl. 1, 217. 



[ 383 ] 


NOTE ON THE APPLICATION OE FISHER’S ^-STATISTICS 


By F. N. DAVID 


1 . The 7 c-statistics were introduced by R. A. Fisher in 1928. In a paper, remarkable not 
only for the brilliance of the statistical technique but also for the condensation of the mathe¬ 
matical argument, he set out the essential properties of the ^-statistics, demonstrated how 
powers and product moments and cumulants could be evaluated, and gave basic tables for 
their use. Subsequent development has been carried out by J. Wishart (1929 a, b, 1930,1933) 
andM. G. Kendall (1940 a, b, c, 1942). Wishart worked chiefly on the multivariate case, while 
M. G. Kendall concerned himself principally with proving rigorously the rules of procedure 
get out by Fisher. In the course of his analysis he gave certain methods which enable some of 
the algebra involved in Fisher’s original procedure to be cut short. Nevertheless, even with 
these short outs the algebra necessary is heavy, and it appeared worth while to investigate 
the possibility of shortening it still further, particularly for those cases where it is not possible 
to make the k z of the parent population equal to zero. 

2. The technique involved in the application of the ^-statistics to any problem of dis¬ 
tribution may be summarized briefly in the following way. We suppose that it is desired to 
fmd the sampling moments of some statistic involving powers and/or products of the one-part 
symmetric functions 

s r = M (r = 1,2,3,...). 


The statistic is written in the form of the /c-statistics by direct substitution. Using Fisher’s 
notation we write 

£{l%h{klkl) = 4«). 

This product moment, which should properly be written as /r'(l“2< 9 3^4' y ), is connected with 
the product cumulants through the identity 


k{t) ^ + k[s) — + *(r a ) + K{rs ) ^ ^ + k{s*) | + , 


slog (1 + f c'(r) t f l +/»'(«)^ +/*' ( r*)| +/(«)^ +/«'(«*)! + •• 


the extension to any number of variables being obvious. 

. Fisher’s basic tables give expressions for the product cumulants in terms of the parent 
population oumirlants and hence by substitution y/(l a 2 /? 3 r 4' y ) may be evaluated. 

3, The relations between the product moments and the product cumulants can most 
easily be derived by making use of an ingenious process due to Kendall (1940 c), which con¬ 
siderably simplifies the labour. Since this method is not widely known we illustrate it here 
for fourth powers and two ^-statistics. We start from the well-known relation that for the 
■parent population 

/4 = Ki + 4/c 3 K r + 3 k\ + 6*2 + k\ , 

and hence for the fourth power of the /c-statistics of order r we may write 
/t'(r 4 ) = x(r 4 ) + 4 x(r 3 ) /c(r) + 3/c z (r a ) + 6 /c(r a ) K\r) + /c 4 (r). 


Bioraetrika 36 


25 



384 


Note on the application • of Fisher’s k-statistics 
Operating on ///(A) by sdjdr, and cancelling the factor 4, we have 

N(r a s) = K(r 3 s) + Sk(t z s) k(t) + *(r 3 ) *(«) + 3 K(rs) k(t 2 ) + 3/c(r 2 ) xr(r) *(«) + 3«(«) /c 2 (r) + /c 3 (r) *( 5 ), 

wherp k(t) = /e, and /c(a) — /f g , 

the parent population cumulants of order r and s. Again operating we have 

li'(r 2 $ 2 ) - k(t 2 s 2 ) + 2 K{rs 2 ) K(r) + 2K(r 2 s) k(s) + k{s 2 ) k(t 2 ) + k(s 2 ) k 2 (t) + K(r 2 ) k 2 {s) + 2k 2 (?\s) 

+ 4/c(rs) fc(r) k(s) + k 2 (t) k 2 (s), 

results which can he checked from the fundamental identity given above. For three variables 
we operate on the appropriate power of r by sdjdr and to/dr and so on. 

4. It is clear that in order to evaluate //,' (r 1l s l ) when h+l — 4 or more, a great deal of 
elementary algebra is required. It is suggested, therefore, that attention might he focused on ■ 

/i(rh l ) = <f(lc r -K r ) h (k s -K s Y. 

It is easy to show, using the relations of § 3, that for h + l = 3 or less 

H(r ll s l ) = k(t 1i $), 

as might be expected. For h+l = 4 we have 

/i(r 4 ) = S(k r - K r f = * (r 4 ) + 3 **(»*), 

/i(r 3 a) = £(k r - K r f (k a — k s ) = K(r a s) + 3 k(ts) k(t 2 ), 

}i(r 2 s 2 ) - S’{k r - K r ) 2 (k e - k s ) 2 = K(r z s 2 ) + k(s 2 ) k(t 2 ) + 2ic 2 (rs). 

The algebra involved by direct substitution is, however, a little heavy, and it is simpler 
to note that 

£(lc r - K r f (K - k 3 ) and «?(fc r - x r f (k s - k s ) 2 
can be derived from y 4 = k 4 + 3k| and /i(r l ) = x(r 4 ) + 3/c 2 (r 2 ) 

by Kendall’s operative process just described. The relations for h + l — 5, 6 may thus easily 
he shown to be 

= /<,(r 6 ) = «(r 5 ) + 10/c(r 3 ) K(r 2 ), 

/t(r*5) = K(r 4 s) + 0K(r a a) k(?' 2 ) + 4/c(r 3 ) k(ts), 

~ K(r 3 s 2 ) + 6 x(r 2 s) K(rs) + 3 K(rs 2 ) x(r 2 ) + x(r 3 ) k(s 2 ), ■ 

$(k r — K r ) a = fi(r a ) = /c(f 6 ) + 15K(r 4 )K(r a ) + 10K a (r 3 ) + 15K 3 (r 2 ), 

M r5,s ) = K(r a $) -p 5 k(? -4 ) k(ts) + 10K(r 2 s) K(r 2 ) + 10/c(r 3 ) /r(r 2 s) + k 2 (t 2 ) k(ts ), 
/t(r 4 s a ) = /c(r 4 «s 2 ) 4- K(r l ) k(s 2 ) + 8/c(r s s) k(ts) + 6/c(r 2 s 2 ) x(r 2 ) + 6 k 2 (t 2 s) 

+ 4/c(r 3 ) /c(rs 2 ) + 12k(t 2 ) ic 2 (rs) + 3 /c 2 (r 2 ) k(s 2 ), 

p(r s s s ) = /c(r 3 s 3 ) +■ 3/c(r 3 s) k(s 2 ) + 9^(r 2 s 2 ) tf(rs) + 3k(ts s ) k(t 2 ) + 9 x(r 2 s) x(rs a ) 

+ /c(r a ) k(s s ) + 6K 3 (ns) + 9/c (r a ) /c(s 2 ) /c(rs). 

5. In order to avoid confusion with Fisher’s notation we shall write 

/i(rV) = S(k r ~K r ) h (k s ~K s ) 1 = (r¥). . 



F. N. David 385 

It would seem that if a table of (r h s l ) functions were available a certain amount of the algebra 
involved in the application of ^-statistic technique would be avoided. For example, suppose 
we consider the almost classical example of finding the moments of 




in samples of n cliawn fiom a normal population and take the process of obtaining ff{x ~). 
We have 


£{x 2 ) = 


( n — 1) (n — 2) 
6 n 


(h)- 


3 ) = 


__ {n- 1) (m— 2) 


6?wc! 




The method as outlined by Fisher (1928) and quoted by Kendall (1943) consists in expanding 
the right-hand bracket and taking expectations 


*V) - - - ^Sr - 2) [/* , (8 , )“|-/t , (8 , 2) + |/t / (3»2«)-^/(3»2») + ..." . 


The product moments are next put in terms of the product cumulants K(r h s l ), whence', 
remembering that k 2 is measured from its expected value, we have 


(m-l)(m-2) 


6m/c| 


rj r> 

k( 3 2 )-x(3 2 2) + - (k( 3 2 2 2 ) + x(3 2 ) x(2 2 )) 

Xo *" 


- ™ (x(3°“ 2 s ) + 3 k( 3 2 2) x(2 2 ) + x($ 2 ) x(2 3 )) + . 
K 2 


"I- 


Reference to Fisher’s tables gives the product cumulants in terms of the parent population 
cumulants and the process is complete. As an alternative procedure suppose we write 


Sip?) = 


(ft-l)(n-2) x| 
6 n k\ 


((^n^n 


Upon expansion and taking expectations we have immediately the (r h s l ) functions, and all 
that remains is to substitute from tables of these functions to obtain S(x 2 ) at once in terms 
of the parent population cumulants. The saving in heavy algebra is considerable, even 
though the original expansion is made a little more clumsy.* 

6. A complete table of (r h s l ) functions is obviously not necessary. For h +1 < 3 the expres¬ 
sions for these functions are the product oumulants already tabled by Fisher or by Kendall. 
When h + l> 3 a corrective term must be added to the product cumulant. Table 1 gives 
a selection of these corrective terms, expressed as functions of the parent-population cumu¬ 
lants, which have been found useful in certain recent investigations. It will be noted for use 
in deciding order in expansions that product cumulants of weight r will he of order ljn r ~ 1 . 
The corrective terms to a product cumulant of weight r will be of order l/m r_2 if the corrective 
terms are two product cumulants multiplied together, of order l/m r-3 , where they are three 
and so on. Thus for h+l = 6 we have x(rV) is of order 1/m 5 , /c(r m ) K(r n s l ) is of order 1/m 4 and 
K 3 (r p ) ia of order 1/m 3 . If, therefore, no terms are required beyond those involving 1/m 3 it 

* For symmetrical populations it is of course simpler to note that k 2 and are uncorrelated, and to 
turn the product of the expectations into (r^s 1 ) functions. 


25-2 



386 


Note on the application of Fisher’s k-statistics 


will only be necessary to enumerate a single corrective term as the contribution from 
/c(rV) (h+J = 6). In Table 1, printed on p. 392 below, the complete corrective terms are 
given for h +1 — 4,5, and those corrective terms for h-\- l = 6 which involve 1/n 3 . 

7. The question of the validity of the expansions in § 5 is a troublesome one. I have given 
elsewhere reasons showing that such a process can be justified in an approximate way for 
some statistics. For other statistics it is difficult to justify except in a limited range of cases. 
An example of this can be found in the coefficient of variation. We define this, for a sample 
of n elements, as a 


v a 


’e 



where x has its usual meaning and 



— S 
n-lih 


(*<-*)*■ 


In samples from a normal population it is clear that no matter what the mean and standard 
deviation of the parent population, the probability exists that x can be close to zero and can 
be negative, and hence the true distribution of v has, theoretically, infinite moments.. In 
order, therefore, to obtain approximations to the moments of v it will be necessary to truncate 
the parent population at the point x = 0. We must then make the further assumption that 
% is chosen large enough for the proportionate frequency lying outside a given multiple of 
the standard deviation of the mean and of the standard deviation of the variance to .be 
negligible. Under these two assumptions the expansion will be valid. If the parent population 
is not normal but is, for example, a Pearson curve with start fixed at zero or at some positive 
value, then it is necessary to make only the assumption regarding n, the sample size. 

8. It is assumed that we have a parent population which has cumulants k 1} k 2 , ... up to 
any order required. A sample of n elements is randomly and independently drawn from such 
a population, and to each element of the sample we attach a random variable x v x 2> ...,x n . 
We may then write 



and is the coefficient of variation in the parent population. Under the restrictions just men¬ 
tioned we may expand the right-hand side of the expression for v, and, for example, for the 
second moment about an arbitrary origin we obtain 


yifV) = 


i k 2 

1 + — 




-*i) . 3 (*h-^i) a 4 (ki-*i) 3 , t>(h - 


+ - 


4 "T 


or in terms of the (rV) functions 



- 1 -v l < T > + i l < T, >-g < T, > + k (I ‘>--• 


+A 2 )-Ar(2i) + i(2i I )-Ar(2i’)+-' 







It is obvious that (1) and (2) are zero, and the other terms may be written down from tables 
of product cumulants and Table 1 of corrective terms. 



F. N. David 


387 


9. The first four sampling moments of the distribution of v are, to order 1/w 2 , 
1 .. . , / v \ 1 / 1 * 4 1 * 3 k»\ 1 


J_ / K i _ 3 K i K a . ^6 i K l . K s 3 *4 5*o 3*f \ 

n 2 \ 128*| 16*1*4 I* 3 *! 4*1*1 8 * 4 *l + 8 *l* a 2*4 *f / 

— I ) 2 \4*1 + 32/ ’ 


9 *4 1 * 3 

1* 2 \ 

32*1 8*2*4 

4*1/ 


1 

(72' 


(«- 




+ 


2(71-1) 


n i 


+ i _ 1 *s , 6 K i 10/<: a 8 *l\ 

' 2 \32*1 4*1*4 8*1 4*1*1 4*4*1 2*1* 2 *1 + ** / 

1 /5 * 4 1 * 3 * 2 \_1_/1*| 1\ 

77 ( 77 —1)\8*| 2*2*4 *!/ (71-l) 2 \2*l + 8/’ 


1 

P 


,3 _ U _ A/_ 3 K \ . 1k 6 , & *3 3 K 5 

^ 3 \F/ 77 2 \ 16*1' 8*14*1*1 4*1*4 


3* 4 10* s 6*1 

+ *2*1 *f + *1 


+ - 


1 3/Cg 


I) 

*3 3* 2 \ 1 /1 *3 , 1\ 

;*4 + 7i) + (^ip\2/ri + 4)’ 


±<?(v-£(»))* = H 


77(77-1)\4*1 *2 

_ lv \ 1 / 3 *1 3*4*3 , 3 *3 . 3 * 4 *| 6* 3 \ 

~^ 4 \F/ 77 2 \16*2 2*1*4"^ *1*| + 2*1*2 + 0 *4 k\) 

1 /3*4 3* 3 3* a \ _ _ 

' 77(77 1) \4*1 *2*1 *1/ (77-l) 2 \4/‘ 

For samples from a symmetrical population: 

U ,(A _ 1 , l(-~^ , Kn\ _1_ + 1/ ii^ + _^o_ + S^ + M 

^w" +w \ 8*2 *1/ 4(77-l) + 77 2 \ 128*f + 16*1 ' 8*1*2 *1/ 


+ 


_1_ 


77(77- 1)^32*1 4 K\J + 32(77-l) 2 ’ 


Jl\ = l(^ + K A 

H \vj 77\4*| *1/ 
ll\-Ll A'j i K ° 1 3fC * 1 

?7 2 \ 16/4 8*j> * 2 * a *f / 

_ 1 / 3 *1 | 3 *4 | 3*I^| | 


1_ 1 / 7 *g *6 5 * 4 8*j\ 


2(t7— 1) ^77^32*1 8*1 + 2*1*2 


+ - 


1 


/3*4 3* 


77 ( 77 - 1) \4*1 
1 /3(4 3*2 


+ 


l*i\_1_/6* 4 * 2 \ 1 

*1 / 77 ( 77 -1)\8*1 * 1 / 8 ( 77 -1) 2 ’ 

3*j\ 

* 1 / 


4(77 —l) 2 ’ 


77 2 \l6*t + 2*l*a *f p 77 ( 77 -1)\4*1 
For samples from a normal population: 

F 2 1 


3*g\ 

* 1 / 


+ 


4(77 — l) 2 ’ 




/ ( 2 


V 

V) 

V 

,V 1 


= 1 +- 


n 


37 4 

H— 


y 2 


4(77- 1) + 77 2 477(77- 1) + 32(77 l) 2 ’ 

1 8F 4 1 F 2 

— 4 - 2 

77 2 


1 


-2 + 


My) 77 2 + 77(77 - 1) + 4(77-l) 2 ’ 

3F 4 3F 2 3 _ 

F/ _ ^ + 77(77- 1) + 4(t7-1) 2 ' 


77 2(77— 1) 

6F 4 3F 2 


8(77 — l) 2 "^ 77 ( 77 —■ 1) ’ 







388 Note on the application of Fisher's Ic-statistics 

10. .In order that the truncation of the parent population at the point x = 0 should play 
as small a part as possible, it will be necessary to choose V small. Thus for the case of a normal 
parent population it is assumed that V can never be greater than or in other words it is 
ass um ed that the truncation is made at a point farther away from the mean than - 3cr. Iti s 
understood that in practical problems it is rare to find a V as large as The manipulative 
algebra involved in obtainhig the approximations to the moments given above is elementary 
but heavy, and it would be satisfactory if it were possible to apply a check of some kind. 
T his is difficult for the general case. We may note that for the normal population, with F 
small and n large, we have 

a value given, it is believed, originally by K. Pearson. For the general case Kendall (1943) 
S ives /«\ l //.. \ lTif. A-.. ~l 1 


fv\ 

1 /> 4-/4 

, /hi 

H \ 

_ 1 K i | ,C 2 K ‘i 

w 

n\ 4/4 

V' 


n[_ 4x| a:| 


which, for large n, agrees with the leading terms of the expression for /i 2 (u/F) for the general 
case given in the preceding section. These small checks were all that could be found. 

11. In spite of the restrictions made necessary in order to use the expansion, it is possible 
to get some idea of the distribution of vjV from the approximate moments, particularly if 
all that is required is to gauge the approximate sample size for which it is reasonable to 
assume that v is normally distributed. We take the case where the parent population is 
normal; the momental constants for selected values of vjV and n are given in Table 2. 

Table 2. Momental constants of the distribution of vjV when the parent population is normal 


K 

cr 

/?i 

A 

Mi 






0970 

0-346 

031 

2-93 

0-966 

09865- 

0-246 

017 

2-94 

0-9786 

09929 

0175 

009 

2-97 

0-9894 

09963 

0143 

0-06 

2-98 

0-9930 

09966- 

0-124 

0-06 

2-98 

0-9947 


0961 

09759 

0-9880 

09920 

0-9940 


It is clear that the distribution of vjV as judged from the approximate moments tends to 
normality reasonably quickly as the sample size is increased, and that for samples of over 40 
little error will be made in assuming that vjV is normally distributed. 

12. McKay (1932) considered the distribution of 




S 1 n 

v = - and s 2 = - V (an — x) 2 . 

X »i±i * 


where 










F. N. David 


389 


He showed that this quantity is approximately distributed as y 2 with n—1 degrees of freedom 
when the parent population is normal. In the previous sections of this paper we have been 
considering „ 


where s\ is the unbiased estimate of the 
might expect 

(m-1) 


population variance. It follows therefore that 



we 


to be approximately distributed as y a with n — 1 degrees of freedom in samples from a normal 
population. We may use this approximation of McKay to check the adequacy of the 
assumption that the distribution of vjV is normal when that of the parent population 
generating the sample is normal. The procedure is as follows. We choose a significance level 
for y a , say 5 % for the sake of illustration, and for a given n and V we calculate % 05 corre¬ 
sponding to this level. Hence, assuming v/V is normally distributed, we may oalculate the 
tail area corresponding to this u 0 . 05 . The results of such calculations for two values of n and 
three values of V are given in Table 3. They again suggest that the assumption of normality 
for v/V is not likely to lead to serious error in tests of significance, provided the sample is 
of reasonable size. 


Table 3. Estimated probability integral of the distribution of v/V corresponding to yjj. 05 


n 

v! 

li 

II 

-V 

II 

101 

0-047 

0-048 

0-049 

41 

0-046- 

0-046 

0-047 


13. It is seen (for example in Table 2) that as n increases, the distribution of v/V becomes 
less and less dependent on V. We therefore look at the possibility of abbreviating the expres¬ 
sion for the second moment. Four possibilities may be considered. We assume that for a 
normal parent population vjV is normally distributed about unity with standard errors 
equal to 

<» Jk- < a > 7ikj ll+2P) ‘ <Ui) (v + 2i^Tj)‘- 

/F a 1 8F 4 1 V 2 \* 

\ n 2(n — 1) n 2 — 

The results for sample size 41 and different values of V are given in Table 4. 


Table 4. Various estimates of standard error of v/V for samples of size 41 


Expression for 
standard error 

vi 

1! 

v=\ 

II 

(i) 

0-110 

0-110 

0-110 

(ii) 

0-122 

0-115 

0-112 

(hi) 

0-123 

0-116 

0-113 

(iv) 

0-124 

0-116 

0-113 





390 


Note on the application of Fisher's h-statistics 

Approximation (i) is clearly inadequate for the size of sample considered, (ii) is not far off 
(iii) and (iv), and will probably be good enough for rough tests of significance. Actually, 
since it gives values less than the values obtained from (iv), the probability integral calculated 
from it will give values closer to the true significance level as judged by the y 2 distribution. 

14. It will be seen, as might indeed be expected, that there is a similarity between the 
moments of the distribution of the standard deviation estimated from normal samples and 
those of the distribution of vjV. Approximations to the moments of the distribution of the 
standard deviation can be found quickly, using the ^-statistic technique. Writing 


then 



1 + 


1 


2k. 


164 


5 ( 2 4 ) +... 


1284 


84 1 ■ 1 / *« 15 1 / 9 * 4 \ 1 (l Kl\ 

uk\ i(n-l) 4\16/c! 128x|/ w(% — 1) \32 k\] + (n— l) 2 \32'" 44 /’ 


and for a normal population we have 


jpI °01 1 x ■*- / o e \ j. j- 

& \7\) = “4(^1) + 32(w-l) 2 ’ M*!/ = 2(w— 1) _ 8(«,— l) 2 ’ 

^( 4 ) ~ 4(W — 1 ) 2 ’ ^( 4 ) 4(w-l) 2 ' 

Thus the moments of sJk\ are those of vjV, in which V is put equal to zero in the final 
expansion. Since the distribution of s e tends fairly quickly to normality it may be expected 
that the distribution of v/V will do likewise. 

15. We return to the approximate moments of vjV and consider the case when the parent 
population is not normal. It will be assumed that the parent population may be described by 
a Gram-Charlier Series, Type A, and we write for its functional form 



where H z {x) and // 4 (:c) are the Hermite polynomials of orders 3 and 4 respectively. This 
population has oumulants 


*d * 2 , * 3 , * 4 > *5 = 0, k 6 .== -10*1, *7 = — 35/c 4 k 3 , k 8 — - 35/cf, and so on. 

If we put K i = 0 in this population, we may eliminate any effect due to kurtosis, and similarly 
if we give * 4 a non-zero value and let k 3 = 0 we can study the effect of kurtosis when there is 
no asymmetry. The moments of the criterion vjV may be written in simple forms for these 
two cases. Writing the moments of vjV for the normal population, that is, when *3 = *4 — 0, 
as /4(r/ V | N), fi s (vjV | N) , and putting 

= fc-a-g-y* 

K-2 k 2 

we have the following expressions: 



F. 1ST. David 


391 


Moments ofvjV when /c 4 = 0 in the Gram-Charlier population: 




,/ 

V 

/hi 

v) 

= /h| 

V 


'v\ 


'v 


j) 

= M 

V 




V 


7 

= /hl 

J 




V 

H 

w 

= /h 

h 


^7i + 2u(n-l) + n 2 


- 

1 ) 2 n 2 J 

10F 2 \ 


9F 2 ' 
■ l) 3 "^ n 2 ; 




n{n~L) n 2 


/4 \F/ ™\F|~7 1 n 2 J ' 1) 1 « 2 /' 

Moments ofy/V when k 3 = 0 in the Gram-Charlier population: 

“'‘*(?| W ) +y {i-S^T) + IS] + 3 


+ 32» 272; 


7?) -7?| w ) +y {te5r_-lj + 5]-i£-.rS. 

=711*) +y {s7^i) + S] + il? y - 


A third alternative would be to let both k s and k !% have given values, and to study the effect 
of combined asymmetry and kurtosis in the parent population. 

16. The moments of vjV for some parent population which is different from normal 
having been obtained, the next step is a matter for choice. One may either, from a study of 
the and /? 2 of the distribution of v/V fit the appropriate Pearson curve using the first four 
moments, or one may choose for «/F a distribution having the same functional form as the 
parent population. In either case the procedure enables an estimate of the probability 
integral to be made. 

17. Pearl (1905) gives the momental constants for the distribution of brain-weights in 
a number of races. We take the first line of his table and note that 413 Swedish males had 
brains of average weight 1400-481 g. The coefficient of variation was 0-07592, and the 
and yS 2 of the distribution 0-0287 and 2-7964 respectively. We assume that these momental 
constants are those of the true distribution of Swedish male brain-weights. We then ask the 
following question: if a sample of 51 observations is available and known to come from the 
population with F = 0-07592, what difference does it make to the expected mean and 
standard error of v/V if we assume the distribution of the population to be normal, instead 
of, with /?j and /? 2 as given? Substituting in the general formula we have 

;4(n/F| N) =0-9951, /i 2 (u/F|W) =0-0101, 'cr N = 0-100; 

K(WF 1 PxPz) = 0 9955, Mv/V | = 0-0089, = 0-094. 

For this population, therefore, the mean and standard deviation of vjV appear to be little 
changed from those which would have been obtained if normality had been assumed. It 
appearslikely thatinthe caseof anthropometric data, where the deviations from normality are 
never very marked and the sample size is usually fairly large, that the expressions for the mean 
and standard deviation of njV assuming the parent population is normal will be adequate. 


I should like to thank Professor E. S. Pearson and Professor M. G-. Kendall for helpful 
criticism. 



392 


K(r h s l ) 
*( 1") 

/c(2 l 3 ) 
*( 2 2 ) 2 ) 
a:(2 3 1) 

*(»*) 

< 3 l 3 ) 
*(3 2 l 2 ) 
at(3° 1) 
*(3 4 ) 

*( 41 3 ) 
k( 4 2 l 2 j 
/c(4 3 1) 
*(3 2 3 ) 
*(3 2 2 2 ) 

/c(3 3 2) 

/c(r ft s 1 ) 
*( 1 5 ) 
*(2 l 4 ) 
*(2 2 l 3 ) 
*( 2 3 1 2 ) 
*(2 4 1) 
*(2 3 ) 


iVote on the application of Fisher’s h-statistics 

Table 1. Terms to be added to k(t 1 i s 1 ) to give (r h s l ) 

h +1 = 4 

Add. 

3*1 

ri 2 

3*2*3 

ft 2 

* 4*2 2 *jj 2 /eg 

Jl 2 ft 2 w(w—1) 

3*4* 3 ^ 6*3*3 
w 2 T w(«, —1) 

3*4 12*4*2 12*3 

n 2 «(«■—!) (»— l ) 2 

3*4*2 

n 2 

*3*2 2*4 9*4 * 2 , 3 * 2 *2 6*2 _ 

m 2 n 2 n(n— 1 ) w(n— 1 ) (n —l)(w— 2 ) 

3* 6 *4 27* 2 *2 27*4*3 13*4 *2 

n 2 + n(w- 1 ) + n(w— 1 ) (n—l)(n — 2 ) 

3*J 64* 0 *4* 2 64*3*1 36* 0 *2 486*j*|* a 324* 4 *Jw 324* 2 * 2 n 

~n? + n(n-l) + n(n-l) + (n~l)(n-2) + (n- 1) 2 + (n- 1) 2 (w- 2 ) + \n- 1) 2 (w- 2 ) 

243*5* 2 243*1 108*|n 2 

+ '(n - 1) 2 + (n - 1) 2 + (n - l) 2 (w- 2) 2 

3*fl*2 

n 2 

* 3*2 16*c*| 2*jj 48*5*3*2 34*5*2 72*4*11 144*5*5 24*g(w+l) 

n 2 "w(w—1) + n 2 w(n—1) n(n— 1) (n—l)(n —2) (ra.— l)(w — 2) (n— \){n— 2)(n-3) 

3*8*s | 48 * 3 * 5*2 | 444*5*3 . 102*g*4 216 * 5 * 4*5 432* 6 * 2 * a 72* 6 * 4 (w+l) 

n 2 n(n— 1) n{n— 1) ti(n— 1) (n— l)(n — 2) (n —l)(n—2) (w— \)(n — %){n— 3) 

3 * 5*4 , 6*5*| 18*4* 3 *2 . 36 , 

m 2 n(n— 1 ) «(?i — 1 ) (n— l ) 2 3 2 

*0*4 2*3*5 . 2*5 24*5*3*2 . 9*4*2 6*4*3 18* 4 * 3 

n 2 n(m— 1 ) w 2 n(n—l) n(n — l) n(n— 1 ) («—l ) 2 

6* 4 *5 90*?*5 12*5»i 

+ (w-l)(n- 2 j + (n-l ) 2 + (re-l) 2 (?j- 2 ) 

3*3*5 18*0*3*2 , 27*5*1*2 162*4*3*1 27*5*5 18*5*5 102*5*2 108w*3*5 ■ 

w 2 n(w— 1 ) n(n-l) (w—l ) 2 w(w~ 1 ) (n—l)(n — 2 ) (n l ) 2 (n— l) 2 (w— 2 ) 

Add. h + l= 5 

10*3*2 
n 3 

6 * 4*2 4*| 
n 3 n 3 

3*5*2 7*4*3 14*3*1 

n 3 ra 3 n\n— 1 ) 

*3*2 6*5*3 3*5 18*4*5 (28n —32) *5* a 8 4 

n 3 + w 3 n s n\n-\y n\n- 1) 2 ^(n-l ) 2 * 2 

4*3*3 6*5*4 12*5*5 72*i* 3 *- a 16(w- 2)* 3 80*3*5 

w 3 ?i 3 1) w 2 (»t —1) + n 2 (^-l) 2 n(w-l) 2 

10*3* a , 30*3*5 , JL20*5* a , 320*4* 3 , 40(w-2) , 80(n-2) a , , 160*1 

n 3 n 2 (n - 1 ) n 2 (n - 1 ) + n{n - 1) 2 + n*(n - l) 2 * 4 Ks + n{n -1 ) 3 * 3 K * + (n - l) 3 



F. N. David 


393 


k(3 l 4 ) 
a'(3 2 1 3 ) 
k(3 3 1 2 ) 


K(r’h l ) 
4 1°) 

K( 21 6 ) 

■ at( 2 2 l 4 ) 
/c(2 s l a ) 
/c(2 4 l 2 ) 
*(2«1) 
*( 2 ‘) 

4 31=) 
/c(3 a l 4 ) 


Table 1. (co?iZ.) 
6,+z = e 

6/c 5 ^ 2 , 4 at 8 at 4 

-S-1- 

n U 


3k 7 /c 2 | K a K 3 6 K i ic i t 27 /c s a:| i 90/c 4 /c a A: a l 9/cl 


w 

6/c 7 /c 
n 


,3 + w : 


i 3 

AToAT, 


+ 


60/c,A:i! 


7 ft l | *0 

n 


27 


w 2 (w-l) n 2 («. -1) T n 2 (n - 1) + w(w -1) (n - 2) 


3 "*"n a (n —1) 


k,kI + 


.2 , 3a: 5 /c 0 27(3n-4 )k 6 a: 3 k, 


+ 


27 


n\n- 1) 


KukI+k^kI 


18 


3n,—4 )/c 6 k 8 a: 2 ( 81 

n\n- 1) 2 + * , *«' f V(»_ 

54(4n — 7) \ 162a: 2 /c. 


27(4n —7) 
l) + » 2 (»-l) 2 


+ /c 4 ac 3 /c| 


166(6w —12) 


___7)_\ 

Mn -1) (n - 2) ' r w(n -1 ) 2 (n-- 2)J + »i 2 (« -1) 


; + 


108 


36(7n 2 -30/1+34) 


Mn- 1 ) 2 (ft-2) ^ n(n -1)m(-2)^ + n(n - 1 ) 2 (n - 2) a ' K *‘ K * + (n- 1) 2 («-2) a K * K ' 1 


108(6n—12) 


K»* 4 


Add. 
15 d 


A+ 1 = 6. Corrective terms involving -4 onlv 

n Q 


n* 


15A- a /c 2 

ft 3 

12^ 2 a: 2 3a: 4 a: 2 6/c 4 

ft 3 n* + n 2 (n-1) 

9/c,iAC 3 /c 2 6/cf 1 8 /c 3 ac| 

» 3 + n a + n 2 (» — 1) 

12* 4 k 2 24k 2 * 2 3^/c, 12/c 4 /c| 12 k 4 

u 3 n a (w— 1) n 3 n. 2 (n-l) ft(H.~l) 2 

15/c 4 /c 3 6Q/c 4 /c 3 /c a 60/c a a -4 

n 3 "ft 2 (ft—1) ft(w — l) a 
15 a: 3 90* 2 a: 2 , 180/c 4 /c| 120 k° 

n 3 ^~n 2 (n— 1) n(n— 1) 2 "*"(h— l) a 
15 k 4 a: 2 
n 3 

3a: 2 a: 0 12a: 4 a: 2 27/c,,a: 2 27/c| /c* 18k| 

?i 3 + n 3 + « 2 (ft-l) + ft a (n-l) + ft(ft-l)(n-2) 


REFERENCES 

Eisheb, R. A. (1928). Proa. Land. Math. Soc. Ser. 2, 30, 199. 

Kendall, M, d (1940a). Ann. Eugen., Lond., 10, 106, 

Kendall, M. G. (19406), Ann. Eugen., Lond., 10, 215. 

Kendall, M. G. (1940c). Ann. Eugen., Lond., 10, 392, 

Kendall, M. G. (1942). Ann. Eugen., Lond., 11, 300. 

Kendall, M. G. (1943). The Advanced Theory of Statistics. I. Chas. Griffin and Co. 
McKay, A, T. (1932). J.B. Statist. Soo. 96. 696. 

Peabl, R. (1905). Biometrika, 4, 38. 

Wishabt, J. (1929a). Proc. Lond. Math. Soc. 29, 309. 

Wishabt, J, (19296). Proc. Roy. Soc. Edinb. 49, 78. 

Wishaet, J. (1930). Biometrika, 22, 224. 

Wishabt, J. (1933). Biometrika, 25, 52. 



[ 394 ] 


THE MOMENTS OE THE 2 AND F DISTRIBUTIONS 

By F. N. DAVID 


1. The cumulants of Fisher’s 2 distribution were derived approximately by Cornish & 
Fisher (1937) and exactly by J. Wishart (1947 a) some years later. In both cases, however, 
the assumption is that the parent population (or populations) generating the samples is 
normal. It appears simple, using Fisher’s /c-statistic technique, to derive approximations 
to the cumulants of 2 and F when the two estimates of variance involved are based on 
independent samplings from two parent populations winch may have any distributions 
whatever provided the cumulants exist. These approximations to the true cumulants can 
be used to investigate the effect of non-normality in the parent population on the distribu¬ 
tions of z and F, or to obtain approximations to the power of the z and F tests with respect 
to a set of specifically defined alternative hypotheses. It should be noted that in the type of 
problem we have in mind, the two estimates of variance are essentially independent or at 
least uncorrelated. In many problems in the analysis of variance non-normality in the 
parent population introduces a correlation between the estimates which are independent in 
the normal case (seeE. S. Pearson (1931) andR. C. Geary (1947)). We are not concerned here 
with this latter problem. 

2. It will be assumed that there are two populations n x and rr 2 , each of which may be 
described by cumulants up to any order desired. For the cumulants of n x we write k x , /c 2 ,..., 

and for those of . Following MacMahon we define the one-part symmetric 

functions as •" n 

s r = S *$> 

3-1 

where n is the number of magnitudes involved. Hence if we imagine samples of sizes % and 
n 2 randomly and independently drawn from n x and 7r a respectively, and if we associate with 
each element of the sample a random variable aq, aq,..., x ni , x' x , x' 2 , ,,,, 3 ;^, we shall have 



for and a similar interpretation may be given for 7c 2 and tt v 

3. We begin by expanding 2 : 


log .p “ lo & 





It is clear that expansion of the right-hand side will only be valid provided 

lc 2 < 2k 2 and fc 2 <2/c 2 . 

The general question of the validity of expansions of this type has been attempted only 
by J. B. D. Derksen (1939), but it appears possible to justify the use of such expansions 



F. N. David 


395 


for reasonably large n in an approximate sort of way. We argue for the expansion of 
log a (1 + (&2 — K 2 )! K sd on ly> but it is obvious that the same kind of reasoning may also be 
applied to log e (1 + (k 2 — k' 2 )Ik' 2 ), 1c 2 is distributed with mean x 2 and variance crf v where 


_ x, 2xi 
ar i‘*-Z~ + 


% %-l 

For any reasonable sized n x it is clear that 

W 1^-1/ 

and, in fact, % may be so chosen that for some fixed positive integer r, 

/x 4 2x1 y 

r\ — +-M <x 2 . 

W %-1/ 

Provided r > 3 it is clear that for reasonable-sized % we shall have 

P{k 2 >K 2 + rcr, J<e, 

where e is some small positive fraction. Hence it appears, for % of reasonable size, the ex¬ 
pansion will be valid, except in a small proportion of cases, for most distributions met with 
in statistical practice. An alternative approach would be to regard the distribution of k 2 as 
truncated at the point k 2 = 2x 2 , the moments derived being applicable to this truncated 
distribution. 


2z-log„J = cc-cc', 


4. If we write 

where. a — log e ^l +^—and a! = log c j|l + —, 

then since a and a' are independent we shall have 

^) = flogep + W-K(«'). 

*2 

x 2 (z) =crl = i(x a (a) + x 2 (a')), 

X;i(z) = — ))> 

X 4 (z) = "b ^4(® ))■ 

Further, since a and a! have the same functional form, we may derive the moments of a and 
obtain the moments of a! by adding the appropriate dashes. We consider then 

K - K 2 1 ( fc 2 “ K z) 2 , 1 {h~ K 2? 


i /i , ^2 ^-2 \ ^ 2 K 2 M ^2 ^ 2 ) , ^ (*2 

“= l08 -l 1+ -hr) ^ —- 2 - 4 - 


and take expectations. Using the notation 

<^ 2 -*2 ) r = (n 


it is immediate that 




3x 3 a ' 


4x1' 



396 The moments of the % and F distributions 

From Fisher’s or Kendall's tables of product cumulants and the tables of corrective factors 
recently given (David, 1949), we may write down immediately the right-hand side of the above 
expression, neglecting terms of higher order than 1/wf: 

1 1 x 4 / w|-4?i 1 + 7 \ 4/ 4K-2)(%—7) \ kJ %-7 \ 

1 3(%~1) 2 xlUnjK-l) 2 / 4\ S^K-l) 3 j 1)J 

xl/ 3wf-13w|+9n x + 9 \ _^g/_£§_ gKng) 

k\ ( 4wj(%-i) 3 / 4 Uk%-i) 2 / 4^1 *1 jj-iK-i) 2 

x c x 4 2 <c| 5 

/eg x|2«i’ 

We introduce the notation y r = -3S, / 4 = % -1, 

K 2, 

and write the expansion of $(&) in terms of y and / 4 up to order l//f. Thus 
<%) = -^r(2 + y 2 )+^r 2 (-4+ 18 Ta+ l6y!+4y 4 -9yl) 

+ ~ [ - 42y 2 - 128y| - 32y 4 + 30yl - 96y 3 y x +96y a yf - 3y B + 24y 4 y 2 - 30yl]. 

The expression for &\aJ ) will be similar except that / will carry the subscript 2 and the y’s 
will carry dashes. If we write 

fTrl 1 = 7r„^ 

IfU n n 

we shall have 

*(*)- ilog^-[^ ! (2+y 2 )] 1 _ 2 +~[j 2 (-4+18y 2 +16y?+4y 4 -9y 2 a )J_ !! ' 

+ ~ (- 42y 2 - 128y| - 32y 4 + 30yi - 96y 3 y 1 + 96y 8 y 2 - 3y 6 + 24y 4 y 2 - 30y|)I j_ ^. 

5. The higher moments of z follow in a similar way. We use the expansion 

(log (1 + x)Y = A r _ 0 x r - A r+lix — + A r+2i2 jpp^f+2) ~ Ar+Z ‘ 3 (T + lT^ + 2) (r + 3) + '"’ 
where the coefficients 4 r s are given by Table 1. 


Tablet. Values of A r s 


\ r 

s \ 

1 

2 

3 

4 

5 

6 

0 

1 

1 

1 

1 

1 

1 

l 


1 

3 

6 

10 

15 

2 



2 

11 

35 

85 

3 




6 

50 

225 




F. N. David 


397 


For S(a?) we have 


S\a?) = -g (2 2 ) - 

k 2 


i(2 3 ) + 


whence on substitution 


vLL (24) _ JL 

12/cf j 6/c| 


( 2 6 ) + 


=_i_ +_L-_ + _ L_ , ^ (34-9^ + 13) Kg jgy-lg) 4 4( Wl -2)(3^-19 ) 

v wi- 1 K- 1 ) 2 K- 1 ) 3 4 3 t?, 1 (^ 1 — i) a 434^-1) 4 sji^-i) 3 

I 4(334~ 1 254 + 63w 1 +117) k r 11 25 x 5 /c 3 88(7+—2) 

'4 i2»?K-i) s 4124 4^3^| + “4' 3 4( Wl -1) 2 

4*1 1 00^-2) _137/c| 

4 34(4 -1 ) 2 124 4 

We may expand as before in terms of the -y’s and powers of f v from this write down «?(<x' 2 ) 
immediately, correct each expansion for the origin not being at the mean, and finally obtain 

4 = \ [- f ( 2 + 7 2 )J +2 -+1 [j2 ( 4 - 672-871-2^ + 64)]^ 

+ i [? (16 + 8472 + 384 7 i + 96 74~ 92yl + 352737^ 384y 2 7|+ lly 6 - 96y 4 y 2 + 128y|)7. 
Further expansion and algebraic reduction gives 

* 3 ( 2 ) = ‘s' [ J 2 (- 4 + 4 7i + 7* “ 3 7i)]^_ 2 

+ Yq [ p (- 1 6 - G4 7i ~ 16y 4 + 12yl - 96737! + 120y 2 y? - 3y 6 + 30 y 4 y 2 - 44y|)J 
**(*) = Ye [7 3 ( 1G + 8 7l + 32 737! - 4 87a7!+79 - 12y 4 y a + 20y!)]\ 


It is dear that to obtain an accurate result for * 4 (z) it will be necessary to take further terms 
in the original expansion in order to be able to obtain the result to order l// 4 . 

6. The arithmetic involved in the expansions is heavy, and it does not seem possible to 
obtain a general check. We may note that if the two samples are assumed to have been drawn 
from different normal populations, then 


<^(2)-£l°g c 


4 

4 


1 j._i_ j_ 

2/i + 2/ 2 6/? + 6/!’ 


_1 J. J_ J_ _L _L 

* a(Z) ~ Wx + 2/2 + 2/? + 2/i + 3/1 + 3/1 ’ 

_ 1 _1 _ _1 1 _ 

/C3(Z)_ 2/ 2+ 2/| /? + /r 


When * 2 = 4 , or when the samples may be supposed to have been drawn from the same 
normal population, the results agree with those given by Cornish & Fisher* (1938), We note 


* The Cornish-Fisher results can easily be calculated to any order in 1// that is desired. See Wishart, 
(1947a, p, 174). 



398 


The moments of the z and F distributions 

the interesting property of the z-distribution, that all oumulants except the first are unaltered 
by a change in the ratio of the variances of the two normal parent populations, a fact which 
may be used to determine the power of the variance-ratio test against a set of specifically 
defined alternative hypotheses. As a further small check we may take only terms of order I jn 
in al, whence 


a result derived by Geary (1947). 



7. The exact values of the cumulants for the case when the parent population is normal 
have been given by Wishart (1947 a) . Further to this he showed that a closer approximation 
to the true cumulants (after the first) was obtained by expanding in powers of l/(/— 1) rather 
than in powers of 1//. For numerical work we shall therefore rewrite the cumulants in powers 
of l/(/-1). Let 

r = /— 1 = n— 2. 

Then 

*(*) - i lo ge^ = ~ \j r (2+ r»)] _ 2 + [jp ( 8 + 24 7a+ 16 r!+ 4 7 i - 9 7l)] 


+ [i (- 4 - 84 ra - 160y? - 40y 4 + 48y| - 96y 3 y 1 - 3 y e 4 - 96y 2 yf + 24y 4 y 2 - 30y|) J _ 

K ^ z ) = i [^ 2+ 7i )] +2 +\ ["2 (- 8 7 2 - 2 n- 8 rl + 5 ri)] 




■ 48 \jz (- 8 + 168 72 + 120 7« + 480 7i — 15SSy| + 352y 3 y 4 

+ ll7 6 -96y 4 y 2 -384y 2 y|+128yl)l , 

•_ 1 - 1-2 

**(*) =^[^(~ 4 + 4 7 a i + 74 - 3 7l]_ 2 

+ h “ 80721 “ 2 °74 + 24yf - 96y 3 y 4 - 3y 6 + 120y 2 y? + 30y 4 y 2 - 44 yl) J_ 2 > 


«*(*) = 


16 


(16 + 8y| + 32y 3 y 4 - 48y 2 y 3 + y 6 - 12y 4 y 2 + 20yjj) 


+ 2 


For cumulants of z for samples from a normal population we have, to order 1/r 3 , 

sn = _1_ J_1_1_ 

2r x ~ 2r 2 + 3rf 3r| 6rf + 6r§’ 

, 1 1 1 1 
K ^ {Z) ~ 2r 1 + 2r 2 6 r\ 6 r\' 

K * {z) = _ ^t + 27r 

= ya + ~3- 
'i r i 


Table 2 gives a comparison of the true values, from Wishart, wdth those obtained by sub¬ 
stituting in the formulae for normal parents immediately above. The case considered is 
A = 24 > A — 60. The agreement is satisfactory. 



F. N. David 


399 


Table 2. Gwnulant constants of the distribution of z(f x = 24 ,/ s = 60; parent 'population normal) 




*2 

*3 


Wishart’s exact values 
Approximate values 

-0-0127,429 

-0-0127,431 

6-0301,992 

0-0301,992 

-0-0007,998 

-0-0008,015 

0-0000,867 

0-0000,871 


8 . Geary, in bis 1947 paper, has discussed the effect of kurtosis on the distribution of z 
when both samples are drawn from the same population. As an illustration of the cumulants 
of z derived here we shall discuss the effect of skewness in the parent population for the case 
f 1 = 24,/ 2 = 60. It will be assumed that the parent population may be graduated by the first 
three terms of the Gram-Charlier Type A series,* i.e. that 

where H 3 (X) and H^X) are the third and fourth Hermite polynomials. Tins parent population 
has cumulants 

^1) ^2i ^3! ~ ^6 = 16^31 ^7 = 35^4 v 3 , Kg = —35 k|. 

In order to eliminate as far as possible any effect of kurtosis wo shall put k 4 = 0, when the 
pnly higher oumulant which has a non-zero value will be k 6 = -10/dj, Under this assumption 
and assuming further that k 2 = 1 the momental constants of the distribution of z for different 
degrees of skewness are as given in Table 3. k 4 (z| k 3 .) is unaltered at the degree of approximation 
to which we are working. It would appear from a study of these moments that the effect of 
skewness on the distribution of z is likely to be small, and this is, in fact, the. case. There are 


Table 3. Momental constants of z (f t = 24, / a = 60) when /c 4 of parent population is zero 


A* 2 

K 3 

0-0 

. 

0-1 

0-3 

0-5 

/dkk 3 ) 

/hskN 

cr(z|x 3 ) 

Kakka) ■ 



-0-012,992 

0-030,787 

0-175,462 

-0-000,988 

-0-013,158 

0-031,179 

0-176,576 

-0-001,113 


various ways in which we can estimate the effect of this skewness. We could use the moments 
of z|k 3 to find the Pearson curve with the appropriate yd’s, or follow the Fisher-Cornish pro¬ 
cedure and use Edgeworth’s series or fit a Gram-Oharlier Type A of the functional form given 
above. T his last procedure is approximate but will be adequate for our purposes. Thus if 


* I owe to Dr J. Wishart the suggestion that a more appropriate form for ’the population would be 
1 in/, , ' . # 4 (X) , toy! H a (X)\ 

T 1+r ‘Tr +r '"*r + — ??—)• . 


I agree that for any systematic investigation of the effect of skewness in the parent population on the 
distribution of z it might be better to take this functional form. In the case above, however, I am 

specifying a particular population and my purpose is one of illustration only. __ 

Biometrika 36 






400 The moments of the z and F distributions 

z 0 . 05 is the deviate beyond which 5 % of the frequency might be expected to lie when k 3 of 
the parent population is zero, it is seen that we require to evaluate 


<E> 


1 f“ 

J(2n)J 


.-j— e-^dX+^^r 

<y (27T) J ^o«os~ *5, J Z<j>i 

V[#fi(«ki)] 


■HaW-TTS-t 

-*i(g|*,) a/( 277 ) 


e-^dX 






remembering that 


VtA' a (s|A- 3 )l 




9 . As a check on the adequacy of the Gram-Charlier Type A we first refer to the tabled 
values of z, finding that 

z 0 . 0B = 0-26535 for k 3 = 0, ff = 24, / 2 = 60, 

and then proceed to find the tail area cut off by z 0 . 05 for these values using the expansion just 
above. We find 

$(/c 3 = 0) = 0-05477-0-00439-0-00031 = 0-05007. 


Thus the representation of the z distribution by the normal curve plus the first two corrective 
terms of the Gram-Charlier A gives the probability integral correct to three decimal places, 
which will mean that it gives sufficient accuracy for our purpose. We repeat the procedure 
for K% = 0-1, 0-3, 0-5 and draw up Table 4. It is clear that a moderate amount of skewness in 
the parent population will not affect the z test appreciably for the case considered (/, = 24, 
/ 2 = 60), and this will be true for higher values oif r and / 2 . 


Table 4. Tail areas corresponding to z 0 . 0B (k 3 = 0) when k 3 4= 0. 


*1 

0-0 

. «-l 

0-3 

0-5 


0-050 

0-060 

0-051 

0-051 


10. To investigate the effect of skewness in the parent population for degrees of freedom 
less than those taken in the preceding paragraph, the procedure is complicated a little by the 
fact that the cumulants of z are of order 1 /?i 3 only, and that the Gram-Charlier A does not give 
exact results. It is perhaps useful, however, to give such calculations as were carried out. We 
consider the case ff = 8, / 2 = 16, and take only the normal curve ancl the third Hermite 
polynomial to represent the distribution of z. Carrying through the calculations as before 
we may draw up Table 5. Tor = 0, O should be equal to 0-050 and not 0-051, the 


Table 5. Tail areas cut off by z 0 . 05 (= 0-4760), ~ 8 ,/ 2 = 16 


*1 

0-0 

0-1 

0-3 

0-5 

$ 

0-051 

0-052 

0-063 

0-054 





F. N. David 


401 


inaccuracy being introduced partly by the moments and partly by the expansion. 
However, the figures are, I think, sufficient to show that for the case considered a small 
amount of skewness in the parent population will not affeot the z test very much. 

11. The moments of F can be obtained by precisely the same method as was used for 
obtaining the moments of z, although the results are not as satisfactory and the algebraic 
manipulation required is very much heavier. We have 


F = ^ 


; W) 


&2 ^*2 




the dashes being given to the numerator this time to avoid much repetition. Thus 

and on substitution we have, reducing as before, 

k'z T, 2 4 8 /I 1 6\ Jl 2\ 

} V a L 1 + /a + /i + /l + \7» 71 + 7l) - 7 VI + 7l) 

‘ (l 6 \ (3 4\ 32 , 1 ,40 10 ..lSl 

Tal /a)'by2lj?2 ^3 I + Y3Y1 ^3 dye m YzYiw YiYz *3 dy a j.3 ■ 

\j2 J2J \J 2 /2/ J 2 /2 J 2 JZ J 2J 

Collecting up terms we may write 

/(j) = ^ri + T(2+y*)+a(4-y«“y5-y»+ 8 yt) 

k 2 > L j 2 7 2 

+ i (8 + 5y a - 2y* + 6y 3 - 4y 2 + 32y 3 y 1 + y 6 - 40y a y| - 10y 4 y 2 + 15y|)J. 

Similarly by expanding <??(F 2 ) and $(F 3 ), correcting to moments about the mean, and 
collecting terms we obtain 

4 = ^ri(2+y^)+7(2 + y i! )-^H-i(16 + 7y a -8yf-2y 4 +8yl)+ ? ^(2 + yi)(2+y a ) 

K 1 LJl J 2 Jl J2 JlJ 2 

,Yz 3y^(2 + y a ) , (2+y a ) t 


' + '~Tn ( 28 + ^ ~ 16y i -47/3 +15 ^) 
/1/2 


4?2 f 

Jl JlJ2 

+ -i (88 + 37y a - 32y 2 + 38yl + 96y 3 y 1 + 3y 6 -152y a y| - 38y 4 y a + 69y|) , 

1 2 J 

:) 


1 /. <j t u » j 2 /I « 1 yo/i' /o ■ - 

J 2 

ft(F) = ^ f- (yi ■+ 4yi a + 12y' + 8) + ^ (2+y a ) (2+yi) + ^ (16 + 12y a - 4y a -y t + 6yS 
L/i /1/2 ■< 2 


(- 2yi - 8yl 2 - 1 2y') + (Yi + ±Y? +1 lyi + 8) 


+ Tn ( 2 + ri) ( 56 + 3 4y 2 - 20y?- 5y 4 + 24yi) 

JlJ 2 

+ V 256 +180y a - 208yf - 28y 4 + 168y! + 96y 3 y 4 + 3y e - 204y a y| - 5 ly 4 y a +116y 3 a ) J. 

ft J 

h{F) is of great length and complexity, and for this and reasons given in the succeeding 
paragraphs we have not written it out here. 

26-2 



402 The moments of the z and F distributions 

. 12. It is clear that except for large values of % and the expansion gives expressions 
which are not very good approximations to the true moments. Moreover, it is not thought 
that retaining higher powers of 1// in the expansion will help very much. The numerical 
coefficients are increasing so rapidly that for small f the quasi-asymptotic series begin to 
diverge before they have converged to any quantity close to the true value. This is perhaps 
most easily seen if we consider the case of the moments of F when the two samples have been 
drawn from the same normal population. For this case the true moments are easily found 
to be (see, for example, Wishart (1947 b)), 

ft , W 1 +/.- 2 ) „ m 

M ] ~fz~ 2’ ^ )_ A(/ 2 -2) a (/ a -4)’ \fj (A-2)3(/,■ -4)(/,- 6) • 

In the expansion in § 11 we write /c 2 = A and put all the other k ’s equal to zero. We have then 

S’(F) = 1+T + 71 + H> 

J 2 J 2 12 


(Tj 


/I IN 12 16 56 

" lA + A/ + AA + /l + A/i 


+ 


96 336 


88 

fV 

256 


_8 _24 16__ 

/4 3 (^ „2+J?2 ■ 

Jl ill2 /2 J 1/2 111 2 J 2 

These approximate moments may be checked by expanding the true moments as power 
series in l/A;-and 1/A', whence the leading terms are found to agree with the approximate 
values just given. Numerical substitution shows, however, that these approximate expres¬ 
sions do not agree very well with the true values even for f x = A = 20. 

13. Possibly a better approximation to the moments of F when the samples are from any 
two parent populations may he obtained by the following artifice. The moments of F when 
the samples are both from the same normal population are given by those terms inside the 
brackets which are not multiplied by any y’s. We know (Wishart, 19476) the exact values 
ir'i(F N ), /r 2 {F N ), /■i, i (F N ) i for which these values are only an approximation. Let us write, 
therefore, 

m = | { 72+y!+r3-3yl) 

+ j| (6y 2 - 2yf + 6y 3 - 4y| + 32y 3 y x + y 6 - 40y 2 yf - 10y 4 y 2 + lSyf) J, 


^2 \ , 7?j_—_A 2 jQ-.2 

„ 2 P'2,\Fn) t f i f 4“ f 2 ( y 2 »7l 

A 2 L 


A A , /! 

1 72 3y'(2+y a ) 

T j?3 fz f -r 

J 1 J lJ 2 


n 

(2 + y()(28 + 9y a 


2y 4 + 8y|) +jj (y£(2 + y a ) + y a (2 + y 2 )) 


■16yj~ 4y 3 + 15y|) - 56 


A/1 


Ps(F) — 


x 3 
K 2 


+ji (37y a - 32y 2 + 38yi + + 3y 8 - 152y a y| - 38y 4 y 2 + 69yl)J, 

A3(Av) + j2(A + 4yj 2 + 12y')+^±^ ^ + ^ - 2 4 + l(12y 2 -4yf-y 4 +6yI) 
J 1 J 1 J 2 J2 


~f\ 


:(2yi+8y( 2 +12 yi) + 


^(2-ry a ) (74 + Hy a + 8) — 96 

AA 


+ 


3 (2 + y 2 ) (56 + 34y 2 - 20yf - 5y 4 + 24y|) - 336 


A/ 1 


+ 


(180y a - 208y 2 - 28y 4 + 168yf + 96y 3 y 1 + 3y 6 - 204y a yf- 51y 4 y 2 + 116y|) . 



F. N. David 403 

In these expansions the y s play the role of terms which are correcting the true normal 
moments for the departure from normality. Provided y x and y 2 are less than unity these 
expressions should give reasonable approximations to the true moments of F when the 
samples are drawn from any two parent populations whatever. It will be noted, however, 
that in ft z (F) the numerical coefficients are rather large, indicating that the approximation 
will not be too good. This remark holds good a fortiori for /t 4 ( F). 

I should like to thank Prof. E. S. Pearson and Dr J. Wishart for helpful criticism. 

REFERENCES 

Cornish, E. A. & Fisher, R. A. (1937). Revue, de l’Institut International de Statistique, 5 , 307. 
■David, F. N. (1949). Biometriha, 36, 383. 

Derksen, J. B. D. (1939). Ann. Math. Statist. 10, 380. 

Geary, R. C. (1947). Biometrika, 34, 209. 

Pearson, E. S. (1931). Biometrika, 23,119. , 

Wishart, J. (1947 a). Biometrika, 34, 170. 

Wishart, J. (19476). J. Inst. Actuar. Skid. Soo. 6,172. . 



[•404 ] 


t he method of frequency-moments and its 

APPLICATION TO TYPE VII POPULATIONS 

By HERBERT S. SIOHEL 

National Institute for Personnel Research, South African Council 
for Scientific and Industrial Research 

Part 1. Theobetioal 

The method of frequency-moments was developed primarily with the object of overcoming 
certain difficulties when fitting growth curves to observed data. In the author’s original 
paper (1947) examples were given on how to fit exponential, logistic and Gompertz functions. 
In the second part of the same paper it was pointed out that the method could also be applied 
to the graduation of frequency-distributions and examples of normal, lognormal and 
Pearson Type VII curves were presented. 

Yule’s (1938) investigation, based on an early suggestion of Karl Pearson, has come to 
the author’s notice recently. The fitting process described by Yule is essentially the same 
as my method of frequency-moments. The derivation of the standard errors given on the 
following pages is based partly on Yule’s work. 

In this paper we are more interested in the problem of estimating the parameters of a 
given type of frequency-distribution by the method of frequency-moments than in fitting 
the proposed distribution to a set of observations, though this aspect is dealt with in the 
Last section. The present investigation has been limited to the case of frequency-distributions 
only, and the section comparing the maximum likelihood solution with the frequency- 
moment method has been confined to a Pearson Type VII population. ' 

Definition of parameters 

The nth frequency-moment of a population represented by a probability law 

ydx=f(x)dx 

f+«> 

will be defined as J n = N n j y n d%, (1) 

where N is the total number of items in the sample and is equal to J v Corresponding to (1) 
we shall define the nth probability-moment of the population as 

J f +c0 

= y n dx. (2) 

0 1 J -CO 

In practice we are rarely in the position of estimating directly. However, it is com¬ 
paratively easy to estimate the parameter 

n / C + K \ n r+l 

+ V d A =2>f, say. 

\Jb 1 i=0 



( 3 ) 



405 


Herbert S. Siohel 


By suitable choice of a and b we can make 

C a |*+C0 

rr 0 = ydx and n r+1 = ydx 
J ~co J b 

smaller than any preassigned e, so that in practice 


r 


i=l 



r 


- 

i=l 


(3a) 


where r is the number of classes of equal width h = (b — a)jr into which the population has 
been subdivided. 

In the limit we have lim h 1 -” <u„ = Q„. 

, »*' 71 

—>0 


For computational work it is convenient to make h equal to unity. Even for r as small as 15 
the approximation 

is reasonable for most oases provided n is of a low order. It may be improved by the use of 
a correction to be described in Part 2. The parameter (o n is hereafter called the nth working 
probability-moment. 

Certain ratios of working probability-moments are defined as 


and 


a n = ^I (»- 1 , 2 , 3 , 4 ,...), 

„ _ , _ i s b 7 \ 

a ” ~ 6 j | (27l+1) ~ 2 > •■■1 


(4) 


The a n coefficients may be used as measures of kurtosis. 

The dispersion of a population may be represented by parameters of the nature 


Pn-i ~ . > 


(5) 


subject to certain limitations mentioned in Yule’s (1938) paper. 

When scale and location of a distribution are changed we transform the variate by 


z = kx+l. 

The transformed distribution becomes 

(z-l\ 1" 

and ^ w -skL 'h-H • 

cujz) = ( 6 ) 

It follows that the working probability-moments are independent of the constant l and have 
to be multiplied by k x ~ n if the variate is multiplied by k. They are, therefore, semi-invariant 
under the transformation z = kx + l. 

The cc n coefficients are unaffected by the transformation, as can easily be shown by 
substitution of equation (6) into (4). 



406 Meihod of frequency-moments 

In the following, extensive use will be made of working probability- moments. In practical 
applications, however, we usually deal with working frequency-moments,. In this fact lies 
the justification of the name.suggested;for the new method. 

Large sample mean values, variances and covariances of statistics o n and a n 

T'h 1 

We may estimate co n by o 7l = pf, (7) 

i=0 

where p i = fJN = observed proportion of observations falling into the ith class interval. 
Denoting the deviation of p i from its mean value ■n i by 

hi = Pi- 

we have for the mean value of o n 

E(o n ) = HE{p?) = I,E{Tr i +8p i y i 
■ . : = ZE{nf + mrl^dpi + 8p\ + ...), 

where n M is written for . Neglecting higher powers of 8p { as E(8pf) will be of order N~ 2 , 
and using results obtained from the binomial theorem. • 

Em = 0 and E{8p\) ^ 1 ^ -, 
we find E(oJ = £ ^ (nf- 1 - <)] + 0(N~*) 

■,=o> n + T fm 1 -w n ) + 0(N-*). ( 8 ) 

For large N E\o n ) =o) n + O'iN- 1 ). (9) 

Writing 8o n for the deviation of o n from its mean value E{o n ), we have 

K = o n -E(o n ) = m-Ln?+0(N-i) 
i =L[n^-Hp^n^-Hp\-\ + 0{N^). 

After squaring this expression and taking mean values we have 

^(o^^niLnf^EiSp^+^m^WPiSp^ + OiN^), 

where i =j =j and all permutations are permitted. On substituting the value for E{8p\) given 
previously and also the covariance 

E(8pm = 

) 

we obtain var ( o n ) = — [Imf 1 - 1 - Lnf 1 - + 0(iV' 2 ) 

2 ' 

= ^ [ W 2m-1 ~ w «] + 0(N~ 2 ). (10) 


As an example, let us find the efficiency of the method of frequency-moments in estimating 
the standard deviation cr of a normal population. For class width h — 1 



e-^l^dx 


= (2 



. Herbert S. Sicrel 407 

o', expressed in terms of working probability-moments of order f (n = f for reasons to be 
explained at a later stage; see equation (53)), is 

1/2 

We can, therefore, estimate cr by s' = - o^ 2 . (H) 

Writing (11) as s' = ~ [E(o^) + to*]- 2 , 

and expanding the bracket for small deviations of So i such that 




It follows from (12) that s' is a consistent estimator. 
Further, for small 8 

var(s') =— wf 8 var(o„), 


and hence, from (10), 


var ( 5 ') = ^ 7 )W it 6 [w 2 -wf] 
= |(fV2-l)cr 2 . . 


Hence the efficiency of s' is given for large N by 


Eff. (s') = ■ 


■tM 


= 0-916, 


which is good in comparison -with the efficiencies of other estimators sometimes used in place 
of the moment-statistics. . . 

The covariances of working probability-moments are derived in a similar way to the 
variances. We find 

COV (o n , o m ) = y - (ji n 0J m ) + 0(N- 2 ). (15) 

The cc n ratios of the worldng probability-moments may be estimated by 

(n=l,2,3,4,...), 1 ■- 

°2 /ic\ 


_ q l(2«+5) ( __ l 3 5 7 \ 

a n ~ 0 |( 2 n+l) W — t> 2’ 2<2’ ■•■> 


Now if a n = <p(o k , Oj), we find in the usual way 


/dd>\ 2 . d<t>dd> . i^4>\ 2 , . 

(stJ 


(17) 




408 Method of frequency-moments 

This result is correct to order N~ 1 . For n = f we find 

var (a|) = jy. [4aj + 9af -a f — 12a s ct|] + 0(N~ Z ), (18) 

and for n = l var(%) =^[9a 3 +16a|-a|-24a 1 a 2 ] + 0(iV'“ 2 ). (19) 


First three exact moments of statistic o 2 

With a view of getting some idea as to the sample size N for which one may expect the 
statistics o 2 and Oj to be normally distributed, it was originally intended to derive the first 
four exact moments of o 2 . It was assumed that if for a given N the statistic o 2 is nearly normal 
it would not be unreasonable to expect Oj, being of lower order, also to be normally distributed 
for the same N. The algebra, however, was found to be extremely heavy. In the following 
the first three exact moments of o 2 are given only. The fourth moment of o 2 , derived below, 
is correct to order jV -2 . 

In general we have 


F(o n ) = 'ZE(n i + 8p i ) n = I l E{n2+mT2~ 1 8p i +n (2) iT?- 2 8!pl+ ...), 

and 

8o n = o n — E{o n ) 

= n ZnrVFi ~ WPi)] + %) ZnrVpt ~ Wpl)] + %) *nr a m “ WP\)] + • • ■ 
In particular, y{{o 2 ) = E{o 2 ) = <y 2 + ^(l ~w 2 ), 

being the exact mean value of o 2 , and 

So 2 = 2E7r i d ft + Sdp|+^ 


( 20 ) 

• ( 21 ) 
( 22 ) 


N 


Hence the exact wth moment of o 9 


/*>*) = mot) = ^2S7r,^+S^?+ 


(23) 


For the second moment we find 


/t 2 (o 2 ) = S E{8p$) + HE(8p\Spl) + 4 St r i E{8$) + 4 Stt* E{8p i 8pf) + 4 Stt? E{8p f) 

+ 4 & vr, * (*p,*p,) + SU(dp|) + , (24) 


where i and all permutations permitted. In order to evaluate (24) we must know the 
central bivariate moments 

= E{ 8 ^ 8 pf) = ~mE( 8 ft 8 f?>) 

of the multinomial distribution 

ati 

P{fl, hi fa,-; fr) = J f f 2 \f s \ - J] 7r l 17r b 7r l 3 ■ ■ ■ ’4 r - 


In general we require for the derivation of fi n {o 2 ) central multivariate moments {n variables) 

/W a ...lCn = E ( S Ph' S Ph S Pf;-- S Pt) ' ' •• 


= mft 8ft 8ft ... 8ft), 


(25) 



Herbert S. Sichel 409 

where the various Vs can take on all the values 1, 2, 3, . r, r being the number of classes 
into 'which the population has been subdivided. The order of the multivariate moment is 

d" ^2 + ^3 + . -. + h n . 

The joint moment-generating function of the multinomial distribution for the case of n 
variables is 

= M+ ir fc A +»i A + -+’ r fc*+»)*. ( 26 ) 

where n = l-n h -n h -7r h - 

The expansion of this expression and the collection of appropriate terms of tfcjkj is straight¬ 
forward but laborious. The various multivariate moments derived from (26) are moments 
about the origin with respect to variables f ln = Np k . They are denoted as v' kj h * k . The 
following moments are needed for subsequent work. Writing i; j, k, l for l v l 2 , l it l x and 
m = N{N- 1)... (IV - r +1), we find 

"lOOO = 

v’lom = Nn i+ NWnl 
v' im = Nn^mnl + N^nl 
v^o^NrT. + lN^+m^+mTrl 
viooo = ^ + 15 JV<%? + 26 W(%?+ 10 ^ 7 rt + W® 7 i, 
rjoao = Nit i + 31 fV< a) 77 * + 90 JV ( 3 ) 7 rf + 6 Si W>n* + 15 JV (B) 77 * + W (#) 77 $, 

"iioo = NWlTilTj, 

K* .= + N^npn), 

v[ m = NVh i n j +SN®n i rf+N®n i rf, 

^1400 = N^TTiTTj + 77* + QN^ 77* 77* + iV (5) 77*77*, 

"2200 = N^nj+N^n^+n^+N^nlTr^ 
v' mo ~ W 2 h i n j +N< a h i n :i {n i + S-nj)+N^TTirfiSni+nj)+ 

"moo = N^n i n i +iV«%*77*(77* + 777*) + 2V<%*77f(7TT* + 677 *)+ 77 ^( 677 * + 77*)+ N^n\ir], 

"mo = iV<%*77*77*, 

"i2io = N^TTtiTjiTk+N^hr^ir,,, 

"1220 = N^b i n-n k + m i \n J n h (n j +n !c )+N ( - r> \ 7 T 2 j TTl, 

"2220 = iV (3) 77*77*77* + W< 4 >77*77*77*(77* + 77* + 77*) + W< 6 >77*77*77*(77*77* + 77*77* + 77*77*) + N^Tf] 

"Illl = N<% i 7r 3 .77* 77 z . 

The remaining moments are easily obtained by permutations of k v Jc 2 , fc 3 , & 4 and the 
corresponding i,j, k, l. 

The central multivariate-moments v* lfca * a ... * n can be obtained from the symbolic identity 
"W- 3 ...^ = ("+M/ 1 ("+#^ a ) fca (v+N7T h p... {V+Nir h )\ (28) 

this being a generalization of the formula given by Kendall (1945) on p. 79. Finally, we have 





410 Method of frequency-moments 

The transfer of the moments is again connected with a great deal of tedious algebra. The 
central moments required are: 


Aiooo — 

A 2 OOO ’ 

A 3000 = — 377*+ 277*), 

>4000 = - 277*+77f) + -p77*(l - 7?7*+ 1277?- 6 t7?), 

Asooo = ^^1(1 ~ 4W* + 577 ?- 277?) + ^ 77*(1 - 1577* + 5077?- 6 O 77 ? + 2477?), 

15 ’ g 

AcOOO = jjy 3 W ?(l — 377* + 377? — 77?) + -^*77* (5 — 3677* + 8377? — 7877? + 2677?) 

+ j^5 7r <( 1 _ 3i77 4 +ia°-77|— 39077?+ 36077?- 12077?), 

1 

>1100 — ~~]y 7T i 7r p 

>1200 = ~2J2 Tr i' rr j(l~% 7T j)> 

3 1 

>1300 = - JP 1 - ~ + 677?), 

>1400 = -;p J,r 4 ff ®( 1 - 377,+ 277?) -~77 t 77^(1 - 1477,+3077?-2477?), 

/‘aaoo = ^ 2 ^ 77^1 77* 77, + 377^77^) -^77*77,(1 - 277* - 277, + 677*77,), 

I 1 

; / f 2300 ~ 77* 77,(1 77* 677, + 577? + 1577* 77, — 2077* 77?) 

I 1 

| ^*77*77,(1 277* 677, + 677? + 1877*77, — 2477*77?), ' 

1 3 

>8100 ~ .^3 ^7 ( — — ^77, + 77? + 677* 77, — 577*77?) 



j + J y4 7 b 7 V ' 1 1777,+ 4277?-2077?+ 4177*77, - 15677*77? + 13077*77?) 

I ~^6 ,r i 7 r 7 ( 1 "• 277*- 1477, + 3677?- 2477? + 4277*77,- 14477*77?+ 12077*77?), 

;2 

j /^1110 ' hi 

1/2 

/‘mo - -^2 W ,W,^(1 - 377,) +-p77*77, 77 & (1 - 377,), 

1 2 

Amo - ~ Jp 7T i 7r j 7r k(~ - 577, - 577 *. + 2077,77*.) + — 7 r* 77 , 77 *.(l - 377 , - 377*.+ 1277,77*.), 

Azaao = 2^77*77,7r,.(l--77*-77,-,77 7c +:377*77,+ 377*77*.+ 377,77*.- 1577*77,77*.) 

1 . . 

^. 4 77*77,77^(3 - 777* - 777, - 777*. + 2677*77, + 2677*77*. + 2677,77*. - 13077*77,77*.) 
2 

+ ^75 7 r i 77 ‘i 7 r fc ( 1 ~ 3n i - ?>7r l ~ 377*. + 1277*77, + 1277*77*.-+ 1277,77*. - 6077*77,77*), 
3 0 

Allll = ^77*77,77*.77*-—77*77,77*77*. 



Herbert S. Siohel 


HI 

The other central moments may again be obtained by permutations. We are now in the 
position to find /t 2 (o 2 ) by substituting appropriate moments from (30) into (24), leading to 


4 2 2 

( w 3 - w a) + ( w a ” 6w 3 + 6w i) ~Jp( w 2~ 4<y 3 + 3w l). 


(31) 


the first term of which is the same as equation (10) for n = 2. Similarly, /i 3 (o a ) can be found 
by using equations (23) and (30), i.e. ; 

g § ‘ 

M°z) = ]ys( 4w 4 + 5wl - 9cO a w 3 ) - — (24 oi 4 - 4w 3 + 3o>\ + 22w^ _ 45 m 2 w 3 ) 


( 33w 4 ~ 24:W 3+ w 2+ 64w|+ 15«f - 144 (i) 2 w 3 ) 
4 

- ( 486 h “ 1 6w 3 + 0>i + 30w| + 9<u| - 7 2w a w 3 ), 

48 

and to order N~ 2 /i 4 (o 2 ) = (cj 3 - w|) 2 + 0(A 7 - 3 ). 

By virtue of equations (31)—(33): 


AW - W). AW - 3+ 


and for large jV 


(«,-«?)» 

A.(°a) 4= 0, /? a (o 2 ) = 3, 


(32) 


(33) 


(34) 


which indicates the convergence of the sampling distribution of o 2 towards normality. 

For the special law , 

for r — 15 (classes) of width \<y = 5 taken symmetrically from x = 0 we have w 2 = 0-139608 
if we consider the class wiflth-as unity in comparison to 0 2 = 0-141047 for r = oo. Further, 


o> 3 = 0-022506, w 4 = 0-003848. 

A comparison is given below between the first three exact moments of o 2 and their approxi¬ 
mations by the first terms in equations (22), (31) and (32) for a sample of N = 1000. 


Moments 

Exact 

First-term approximation 


0-1396 

0-1405 

/*2(°2 ) 

0-1227 xlO- 4 

0-1206 x 10- 1 

/‘s(Oj) 

0-6921 xl0- B 

0-5766 x 10- 8 

A(°a) 

0-0190 

0-0189 


The approximation (34) to the exact value of /? x (o 2 ) is apparently satisfactory for N > 1000. 
In view of the above, /? 2 (o 2 ) cannot be substantially different from 3 for N > 1000. Hence it 
may be concluded that for N > 1000 the statistic o 2 is reasonably normally distributed; 
O j, the other statistic used in practical curve fitting, may also be expected to follow the normal 
law for N > 1000, as it is of lower order than o 2 , The same argument, however, will not hold 
in the case of Analogous to the moment ratio 6 2 , its sampling distribution will probably 
turn out to be skew even for large sample sizes. 






412 


Method of frequency-moments 


Estimation of the parameters of a Pearson Type VII population from large samples 

Fi sher (1921) has shown that the estimation of the parameters of a Pearson Type VII 

population V(m) c 2 ™- 1 

f(x)dx = [ c a+(a:-|) 2 ] m& (36) 

is inefficient in the case of m ^ 10 if carried out by the method of moments. On the other 
hand, the efficient estimation of the parameters by the method of maximum likelihood is 
so cumbersome, even using the modified procedure suggested by Jeffreys (1938), that it 
seems hardly likely that it will ever be adopted by the practical research worker. In the 
following sections an attempt has been made to show that the method of frequency-moments 
strikes a balance between the need for efficient estimation and the practical aspect of the 
method. 

Taking the parameters of (35) in the order £, c and m we find from maximum likelihood 
theory the Hessian determinant 


A = 


(2m— l)m 


(m + l)e a 


0 


2m -1 
(to+ 1)c 2 

_l_ 

cm 


cm 


~d 2 log P(m —-|) a t 2 log F(m) 
dm 2 dm 2 


(36) 


(for definition of Hessian see Kendall, 1945, vol. 2, p. 36). Prom the various zero cross-terms 
it can be seen that/f is uncorrelated with both 8 and m, where f, 6 and to are maximum 
likelihood estimates of £, c and m. Por large N we have, therefore, for either single or joint 
estimation ' ^ ( m + i) c * 


var 


(2m- l)mN' 

Por estimation of c when m is known and for estimation of m when c is known 


var (6) = 


to+1 c 2 
2m-IN’ 


var (m) = 


[F(m-l)-F(m-l)}N' 


(37) 


(38) 


(39) 


writing 


/( to ) = 


c^log T(to + 1) 
dm? 


Por the joint estimation of c and to the variances of the maximum likelihood estimators 
are given by 

var (6j) = — [/(to - f) - /(m -1)] 


to 2 (to+1)[/(to-|)-/(to-1)] c 2 

= TO a (2m-l)[/(TO-t)-/(TO-l)]-(m+l) X ^ 


and 


, * , 1 2m - 1 

mW -ss , ra? 


to 2 ( 2 to — 1 ) 


to 2 (2to— 1) [/(to— f) —/(to— 1)] — (to+ 1) -ZV" 

All of these results, with the exception of (40), were derived in Pisher’s original paper. 


(40) 


(41) 



Herbert S. Sicjhel 


413 


Denoting by a bar the estimators based on ordinary moments, we have 

v “® " (Srsj s- 

and for estimation of o and m when the other one is known 

m — 1 c a 
var (c) = —— - x , 


' ' — 2m —5 N’ 

(2m-3) 2 (m-l) 
var m - -—-—- 1. --. T - . 

' (2m —5) N 

For the joint estimation of c and m by the method of moments 


var( Cj ) = 


var (to j) = 


(to — 1) (2m — 3) (8m 3 — 48m 2 + 108m— 83) c 2 
3(2m—5) (2m—7) (2m—9) X N’ 

2 (m— l)(2m—5)(2m —3) 2 (2m 2 —5?w+12) 


v J 3(2m— 7)(2m — Q)N 

all of which were derived previously by Fisher (1921), with the exception of (44) and (45). 
In practice we only need to consider the following cases: 

(a) Estimation of £, 

(b) Estimation of c when m is known, 

(c) Estimation of c and m jointly. 

In almost every application we will have to deal with case (a). 

(b) arises in the case of estimating the scale parameter of an error distribution, especially 
if N < 500. There is strong theoretical and experimental evidence that the errors of truly 
independent observations are distributed in a Pearson Type VII law with a shape parameter 
3 < m < 5. Jeffreys (1938, 1939,1948) has pointed out that the estimation of m from the data 
of a particular experience becomes unreliable if N is less than 500. In that case it is advisable 
to assume m as being known a priori, in the light of previous experimental evidence, with 
a magnitude of 4, say. * 

Most frequently, in practice, we do not know the magnitude of any of the parameters. 
We then have to estimate them from the data simultaneously, provided N is large enough. 
This is case (c). 

The method of frequency-moments lacks a location estimator. We, therefore, have to use 
either the mean or the median for estimating £. Problem (a) resolves into the question which 
of the two statistics, mean or median, is the more efficient one and for what range. From 
(37) and (42) we have, taking the maximum likelihood estimator as standard, 

where \ is the mean statistic. The variance for the median statistic is 

lh _ L _. c 2 tt r r ( m - -|) ~[ 2 m) 

var ® 4 2 Vf 2 (£) 4WL r(m) J ’ 


and its efficiency 


1 c 2 7rr r(m-j) -| 2 

vai 15 ) 4 P 7 [_ T(m) J 

t h - ^ m+i) r r(m) T 

nm- (s) — 7 fm (2 TO _ i) [_r(m —^)J ' 


Table 1 gives some numerical values for the respective efficiencies. It can be seen that for 
m< 3 the median statistic is the more efficient estimator for £. The exact crossing-over of 
efficiencies takes place at to = 2-840, 



414 


Method, of frequency-moments 


Table 1 


171 

Eff.(I) 


i 

0-0 

0-811 

1-5 

0-0 

0-833 

2 

0-500 

0811 

3 

0-800 

0-769 

4 

0-893 ' 

0-741 

5 

0-933 

0-723 

8 

0-955 

0-710 

7 

0-967 

0-700 

8 

0-975 

0-693 

9 

0-980 

0-687 

10 

0-984 

0-682 

co 

1-000 

0-637 


In the practical applications of the second part of this paper, the median statistic was used 
for locating the curve whenever m, the frequency-moment estimator, was $ 2-840. In all 
other cases the mean statistic wa,s employed, 

Estimation problem (b) leads to the interesting discovery that, when N is large, the 
maximum likelihood estimator of c can be expressed in terms of simple working probability- 
moments. Eor the likelihood solution of (35) we have 

1 


3 . 2m -1 

5^1089- — 


This may be written as 
where 

But for large N, 
and (50) becomes 


2m- 

2m 


(t\ vm v 

Vod *™' 1 1 : 

Wo/ i=i 

Jc 2 + K-g) 2 H 


= 0. 
i/m 


(50) 




T(m) 




S [f(Xi)] llm ±N I m[f(Xi)] llm - MWi>/« 

4=1 2=1 


2 m —1 
2m 


0 -m 

Vur(m-i) < m+1)/m ~ C(m+1) '" 1 ’ 


Por a population parameter m = 1 we have 


c = 


2tto “■ Ca ’ 


and for m = 2 


a O 

~ 87ro| Ci ' 


(W) 

(52) 

(53) 


Hence it follows that we will obtain maximum efficiency by estimating c from frequency- 
moments of order 2 and f in the case of m = 1 and 2 subject to N being large. 

The variance of the frequency-moment estimator 6 for small deviations and large N, is, 
from (Si), - / 3c \» , 

'( c ) = 57- var(o (m+1)/ J 


var 


_ / me \ 2 


from (10) 


L. I var (o ( m ^ 

AHn+D/m/ 


var 


(°(m+u/m) = 4(~^) ["( m + 2 )/m -wf m+1)/ J, var (c) = c 2 (m + l) 2 r ^ ( ™ +2>/m -ll, (54) 
iV \ m ' N LS'Hm+Dim ' J 




Herbert S. Sioiiel 


415 


for Type VII law: 


"n=?(V 7rc ) ] 


L _*rr 

T(m) ~] n 

T(nm) 

_r(m-i)_ 


Putting n = (m+2)/m and n = (m + l)/m and substituting into (54) gives finally 

ra-t-1 


var (c) = 


2m- 


c 2 

■l X N’ 


(55) 


(56) 


which is the same as (38), the variance of the maximum likelihood estimator. 

When fitting by frequency-moments in practice, the only sets of statistics used so far 

v V v y v v 

wer and J 3 and J V J$ and J 2 (Sichel, 1947). It is desirable to keep to this procedure in 

order to avoid computational complications and, generally, to make the method as simple 
as possible. 

Corresponding to the sets of statistics just mentioned we have for a Type VII law the 
frequency-moment estimators (52) and (53). For large N their respective variances are 


from (10) 

var (Cj) = ( 

T 

) var(Og) ]c 2 , 

(57) 

and 

var (C|) = | 

2 c\ 

1 var(oj) =^[ ai -l]c 2 . 

(58) 


By use of equation (55) we can express oq and a } as functions of m. Finally, from (38), (57) 


and (58), 


and 


Eff. (4) = 


Eff. (cj) = 


m + 
2m- 

m + 


I IS r r(m-i)r(3m-|) \ / r(2m) \» _ " 

■1/ LI r(m)T(3m) )\T(2m~i)) "J’ 


2m- 


hi 


"/T(m- 1) T(2m- 
_\ T(m) F(2m) 


rni r(|w) \ 2 1 

/\r(|m-i)j 


(59) 

(60) 


The corresponding efficiency for the solution by moments from equations (38) and (43) is 


Eff. (c) 


(m +1) (2m — 5) 
(m—1) (2m-1)' 


(61) 


Numerical values for the efficiencies (59), (60) and (61) are tabulated in Table 2. From this 
table we may draw some interesting conclusions: 


Table 2 


m 

Eff. (4) 

Eff. (4) 

Eff. ( 0 ) 

0'5 

O'O 

0-0 

0-0 

0-8 

0-993 

0-904 

0-0 

1 

1-000 

0-951 

0-0 

2 

0-962 

1-000 

0-0 

3 

0-926 

0-992 

0-400 

4 

0-903 

0-981 

0-714 

5 

0-887 

0-972 

0-833 

6 

0-875 

0-965 

0-891 

7 

0-867 

0-960 

0-923 

8 

0-860 

0-955 

0-943 

9 

0-855 

0-952 

0-956 

10 

0-851 

0-949 

0-966 

OO 

0-808 

0-916 

1-000 


Biometrika 36 


37 




416 


Method of frequency-moments 


(1) For strongly leptokurtic populations of the Type VII law.(low values of m) the fre¬ 
quency-moment estimators are substantially more efficient than the conventional moment 
estimators. 

(2) c } is a more efficient estimator than c s , having the remarkable property of varying in 
efficiency only between 0-92 and 1-00 for such a wide range as 1 <m < do. 

It is the Type VII law of low m which is of practical importance. For example, a normal 
curve can very well represent a Type VII law even down to m = 7. For a sample size of say 
N = 1000 the y 2 -test would still indicate a very good agreement with normal theory even 
if the true population follows a Type VII law. For a sample size of N ~ 1000 we shall detect 
real leptokurtosis only if m< 7. It is precisely in this range that the method of frequency- 
moments scores over the method of moments. It also compares very favourably with the 
maximum likelihood method because the efficiency of 4 varies between O'95 and 1-00 for 
1 7, the computational work, however, being less laborious. 

Finally, we have to consider the most frequent estimation problem (c) when c and m 
are unknown. We have a _ 

For small deviations of rh, from m and for large N 

31oga } ' 


var (rtij) 


0^) ™K> = |_ a r 


3 m 


var (a*). 


(62) 


var («j) may be expressed in terms of m with the help of equation (55). (62) then becomes 


r(m-i) 

4T(3m-^) 

F(2m) 

2 9r(2m-J) 

f r(H i 

2 12F(fm-|) F(fm) T(2m)) 

T(m) 

T(3m) 

Lr(2m-|)J 

1 F(2m) 

Lr(fm-i)J 

r(|m)r(|m-^)r(2w-l)j 


1)~- f(m"f)') + 2[/(2wi-1) -- Flint-- 1) | - 3[/(-| m1) - r(fm-f)]'p 

(f 


The efficiency of the joint estimator rhj (frequency-moment method) is 


Eff- (m,j) = 


var (m j) __ equation (41) 
var [mj) equation (63) ’ 


For the corresponding moment method 
The efficiencies (64) and (65) have been tabulated in Table 3. 


Eff. (;m ) = vat ^ = e 4 uation ( 41 ) 
J var (nij) equation (46) ’ 


(64) 

(65) 


Table 3 


m 

■ Eff. 0V) 

Eff. (mjj 

0'5 

0-0 

0-0 

0'7 

0-677 

0-0 

1 

0-758 

0-0 

2 

0-697 

0-0 

3 

0-661 

0-0 

4 

0-624 

0-0 

6 

0-607 

0-169 

6 

0-596 

0-429 

7 

0-588 

0-594 

8 

0-582 

0-690 

9 

0-577 

0-769 

10 

0-573 

0-818 

00 

0-537 

1-000 




Herbert S. Sichel 


417 


Again, the frequency-moment estimates are much better than the moment estimates in 
the important range 1 ^ m < 7. There is a loss of information in comparison to the maximum 
likelihood solution. However, it is not as serious as it first may seem, because we deal with 
large samples, so that the variances will be comparatively small. Furthermore, given a 
particular sample after the sampling has taken place, it is the ratio of the standard errors 
rather than the ratio of the variances which is of practical importance. This value */[EfF, (%)] 
is quite reasonable for the range 1 7. On the other hand, the computational saving is 

considerable when using the new method. 

It is hoped that the frequency-moment method .will appeal to the practical research 
worker as a reasonably efficient method which involves fairly simple arithmetic. 

Part 2. Practical illustrations 

It has been pointed out repeatedly that the method of moments will completely break down 
when, for a Type VII law, the shape parameter m ^ f. Jeffreys (1948) remarks: ‘In this case 
the expectation of the fourth moment is infinite; but the actual fourth moment of any finite 
set of observations is necessarily finite and any set of observations derived from such a law 
would he interpreted as having m > f. 1 Again, Fisher (1921) proved that the estimation of 
the parameters of a Type VII law by the method of moments becomes very inefficient except 
in the region near normality. 

Here, then, is a field for the practical application of the method of frequency-moments 
because all probability-moments exist and because the efficiencies of estimating the para¬ 
meters by this method are greater than in the case of the ordinary moment solution. 

In testing the goodness of fit we know that the contributions to the total are made up of 

(a) a portion deriving from ‘ errors of estimation ’; 

(b) a portion deriving from ‘discrepancies of observations from hypothesis’ (Fisher, 1948). 

The errors of estimation are dependent on the efficiency of the estimation process. In 

general, we should expect smaller y 2 ’s when fitting a Type VII law with m< 7 by the fre¬ 
quency-moment method than when fitting by the method of moments. This hypothesis was 
put to a practical test by graduating observed distributions first by the method of moments 
and then by the method of frequency-moments. 

Unless we employ fine grouping we introduce a bias into the estimation of parameters by 
equating the nth working frequency-moment statistic to the integral of the nth power of the 
probability law, i.e. 

£/f=-H y n dx = J n , 

i= 0 J — co 

where h = width of class interval. This difficulty may be overcome by estimating the mid- 
ordinates u t of the frequency groups by Hardy’s formula (1909) 

where represent the original frequencies. The sum of the nth powers of the midordinates 
is an estimate of J n provided there exists high contact at both ends of the experience. 

An example will clearly illustrate the advantage of this procedure. 

The frequencies of „ 

y dx=~ ^e-oos^da; 

*J7T 


2 * 7-2 



418 Method of frequency-moments 


Table 4 


Class interval 

Frequency 

0- 1 

1643 

1- 2 

1376-5 

2- 3 

965 

3- 4 

567 

4- 5 

279 

5- 8 

116 

6- 7 

39-5 

7- 8 

11-6 

8- 9 

3 

9-10 

0-5 

10-11 

0 

Total 

5000-0 

Total of entire 
distribution 

10000-0 


were arranged as shown in Table 4. Using uncorrected frequencies we have 

J? 

Jl- 10000, J 2 = 11879366, p 1 » y = 8-41796. 

t / 2 

In a normal population cr = 0-282095p x . Hence s — 2-3747 as compared with cr — 2-3570. 
Tor the corrected frequencies (by Hardy’s rule) we find 

J x = 10000, J 2 = 11965747, p x = 8-35719. 

Hence s oor _ = 2-3575, which is very near to the true value. 

For the purpose of testing the effect of grouping, two adjacent frequency groups were 
amalgamated and finally three groups, each set giving rise to two possible variations. When 
applying the midordinate rule it sometimes happens that there is a negative adjustment to 
a cell at the end of the experience which in the original series had no frequency. In such a 
case it is best to combine two or three groups and apply their total adjustments jointly. 

Table 5 gives a comparison of corrected and uncorrected standard deviations as estimated 
from the various groupings of the original frequencies of Table 4. The adjustments are not 
as effective as Sheppard’s corrections are to the raw moments, but in all cases they decrease 
the bias considerably. 

Table 5 


Class 

interval 

Uncorrected 

s 

Bias 

5 — <T 

Corrected 

5 cnr. 

Bias 

^cor. & 

1 

2-375 

0-018 

2-358 

0-001 

2 

2-427 

0-070 

2-365 

0-008 

2 

2-427 

0-070 

2-365 

0-008 

3 

2-518 

0-161 

2-396 

0-039 

3 

2-512 

0-165 

2-389 

0-032 





Herbert S. Siohel 419 

When fitting a Type VII law distribution by the method of frequency-moments we must 

solve the equation ^ r(fro) T 

“* T(^) r(2->%) Lr(i*-J)_ 

for m. Table 6 facilitates the computations to a great extent. The slight irregularities in the 
i 2 column are due to the uncertainty of the last digit in the calculation of a t . No attempt of 
adjusting the second differences was made. 

Table 6 


ffl 


S‘ + 

£ 4 + 

rh 

Clj 

S 2 + 

rfi 


<J 2 + 

0-6 

oo 



3-7 

1-08289 

6 

0-0 

1-06869 

6 

0-0 

2-09018 



3-8 

1-08218 

4 

9-6 

1-06823 

6 

0-7 

1-64601 

37382 


3-9 

1-08161 

4 

10-0 

1-00782 

4 

0-8 

1-36886 

9132 

22088 

4-0 

1-08088 

3 

10-6 

1-06745 

4 

0-9 

1-28343 

3670 

3737 

4-1 

1-08028 

4 

11-0 

1-06712 

3 

1-0 

1-23370 

1746 

1066 

4-2 

1-07972 

3 

11-6 

1-00682 

3 

1-1 

1-20142 

976 

301 

4-3 

1-07919 

3 

12-0 

1-06655 

2 

1-2 

1-17890 

698 

172 

4-4 

1-07869 

3 

12-6 

1-06630 

2 

1-3 

1-16230 

392 

83 




13-0 

1-06607 

2 

1-4 

1-14974 

209 

48 

4-4 

1-07869 

11 

13-6 

1-06686 

1 

1-6 

1-13981 

194 

22 

4-6 

1-07777 

8 

14-0 

1-06566 

2 

1-6 

1-13182 

141 

21 

4-8 

1-07693 

9 

14-5 

1-06548 

1 

1-7 

1-12624 

109 


6-0 

1-07618 

6 

16-0 

1-06531 

1 

1-8 

1-11976 

84 


6-2 

1-07649 

6 




1-9 

1-11610 

06 


6-4 

1-07486 

6 

16-0 

1-06631 

5 

2-0 

1-11111 

63 


6-6 

1-07427 

6 

16-0 

1-06501 

3 

2-1 

1-10706 

44 


5-8 

1 07374 

3 

17-0 

1-06474 

3 

2-2 

1-10403 

36 


0-0 

1-07324 

6 

18-0 

1-06450 

3 

2-3 

1-10197 

29 


6-2 

1-07279 

2 

19-0 

1-06429 

2 

2-4 

1-09960 

26 


6-4 

1-07236 

3 

20-0 

1-06410 

2 

2-5 

1-09749 

21 


6-6 

1-07196 

3 

21-0 

1-06393 

2 

2-0 

1-09669 

19 


6-8 

1-07169 

2 

22-0 

1-06378 


2-7 

1-09388 

16 


7-0 

1-07124 

3 




2-8 

1-09232 

14 


7-2 

1-07092 

1 

OO 

1-06060 


2-9 

1-09090 

12 


7-4 

1-07061 

2 




30 

1-08900 

11 


7-6 

1-07032 

2 




3-1 

1-08841 

9 


7-8 

1-07006 

1 




3-2 

1-08731 

8 


8-0 

1-06979 

1 




3-3 

1-08029 

8 


8-2 

1-06954 

2 




3-4 

1-08636 

0 


8-4 

1-06931 

1 




3-5 

1-08447 

7 


8-6 

1-06909 

2 




3-0 

1-08300 

. 4 


8-8 

1-06889 

0 




3-7 

1-08289 

6 


9-0 

1-06869 

1 





A detailed calculation of the various statistics involved in the fitting of a Type VII dis¬ 
tribution by the method of frequency-moments is shown in Table 7. The observed experience 
is one of the marginal distributions of the bivariate table, given in Shewhart s book (1.931, 
p. 402), relating to the distribution of random noises (machine measures). 


Now v J. 

J n = SW = 4 = 0-38603, 

From Table 6 we see that for this value of cq 



V V 


1-08387. 


3-6 <m< 3-6, 


and by inverse interpolation we find m = 3-674. 




420 


Method of frequency-moments 
Table 7 


1 


A ft 

A*A-i 


Ut 

(%)» 

(nt \2 

fi 

t. 

Ui-fif 

value 

\ U i) 

Jt Ji 

ft 

0-1349 

0 

0 






0-9 

) 


0-1421 

0 

l 

0 




0-7 



0-1493 

1 

1 

-l 

0 

1-0 

1-00 

1-00 

1-4 

y- 3-7 

0-999 

0-1565 

1 

0 

7 

-0-3 

0-7 

0-59 

0-49 

3-2 



0-1637 

8 

7 

8 

-0-3 

7-7 

21-33 

59-29 

7-5 

J 


0-1709 

23 

15 

9 

-0-4 

22-6 

107-35 

510-76 

18-6 

+ 4-4 

1-041 

0-1731 

47 

24 

26 

-1-1 

45-9 

310-74 

2106-81 

45-5 

+ 1-5 

0-049 

0-1853 

97 

60 

45 

-1-9 

95-1 

927-22 

9044-01 

101-7 

- 4-7 

0-217 

0-1926 

192 

95 

-62 

4-2-6 

194-6 

2714-67 

37869-16 

181-8 

+ 10-2 

0-572 

0-1997 

225 

33 

-86 

4-3-6 

228-6 

3456-43 

52257-96 

224-4 

+ 0-6 

0-002 

0-2069 

172 

— 53 

-22 

4-0-9 

172-9 

2273-63 

29894-41 

180-0 

- 8-0 

0-366 

0-2141 

97 

— 75 

23 

-1-0 

96-0 

940-80 

9216-00 

100-0 

- 3-0 

0-090 

0-2213 

45 

— 52 

26 

-1-1 

43-9 

291-06 

1927-21 

44-6 

+ 0-4 

0-004 

0-2286 

19 

— 26 

14 

-0-6 

18-4 

78-94 

338-56 

18-2 

+ 0-8 

0-035 

0-2357 

7 

— 12 

12 

-0-5 

6-5 

16-57 

42-25 

7-4 

\ 


0-2429 

7 

0 

- 6 

4-0-3 

7-3 

19-71 

63-29 

3-1 



0-2501 

1 

— 6 

6 

-0-2 

0-8 

0-71 

0-64 

1-4 

1+ 1-5 

0-167 

0-2673 

0 

— 1 

1 

0-0 




0-7 



0-2645 

0 

0 






0-9 



Totals 

942 

0 



942-0 

11160-75 

143321-84 

942-0 

0-0 

3-532 


Further, we have 

Hence 

Finally, 


, i r r(^) 

°' fnol\V(rh-i)} L T(|^) _ ' 


Vo = . 


c= 3-91345. 


= 228-66, 


c>r(m-£) 

and the equation of the fitted curve taldng the class interval as unity 

y = 228-66(1 + 0-066295a; 2 )- 8 - B74 . 


As m > 2-84 we use the mean statistic for locating the experience. Centring the frequency 
group 225 at zero on the x-scale and having the positive direction of the scale downwards 
in the f { column of Table 7, we find 

£= -0-011677. 

Ordinates were calculated at the beginning, centre and end of each group, and areas were 
found by a quadrature formula. The expected frequencies are given in the column of 
Table 7. For a y 2 = 3-532 we find for 7 degrees of freedom 

P = 0-83. 

A corresponding moment fit leads to the equation 

y = 215-52(1 + 0-029992x 2 )- 6 ’ 227 , 
with y 2 = 6-756 and P = 0-46. 

In this particular case the frequency-moment method gives, therefore, a better fit than the 
conventional method. 






(Column, numbers correspond to data, derived from ‘Sources of Data* listed under the references) 


Herbert S. Siohel 


421 


S' 

rH 

in >piO 9 UDIOIOIC 

CO AAeGCOHiCSHiCfrlOOeqcqcbditr'Cq© 

00 O CO C~ © CO Cq Hi HlMHH (M *—< CO 

(M cq rH 

1000 

ID CO 

CO eq 
© l> 
rH eq 

0-20879 

0-00279 

304-85 

84-14 

ID ID 

9 cp 
r cb 6 
o 

rH 

rH 

© © 

ID O 

66 

V 

S 

rH 

Hi Hi CO H< H< CO CO^'rH 
rH CO lO CM H 

rH 

300 

ID CO 

O UJ 

o o 

■Hi cb 

0-03783 

0-02140 

CD Hi 

05 rH 

CD A 

rH rH 
rH rH 

co cq 

Hi © 

6 A 

CO o 
a oo 
66 

CO 

ia lfliflia lain i 

© O«brFO<Nl0C000'«tC?5Ot>05C0l>l0i0C0Cq 

00 CilOlOrHO'rrHOlO^’^'^COrHO'lrH 

10 lOitrtMrlH 

3003 

os cq 

CO CO 

co 9 
cq A 

0-02009 

0-00652 

00 ID 

CO 9 

00 05 
rH CO 

CD ID 

© CO 

Hi CO 

Hi A 

CO rH 
rH 

rH rH 
© © 

© © 
66 
y v 

( 12 ) 

H< COrHOOCMOOCOOSCOOOilOlOlMrHrHrH 

CO t s *lOffOCqcO«DCO<Mr-lrH 
rH rH rH rH rH 

CO 

§ 

rH 

Hi 

05 00. 

I> CO 

cb oo 

00 « rH« 
©j-H 1 

© 2 w 2 
-Hi M -Hi M 

■Hi X cb x 

co eq 

CO -Hi 

6 A 

05 CO 

rH rH 

I> 6 

cq r- 
6 © 
rH rH 

eq co 

Hi Cp 
66 

Central 
value 2 

iO IQ lO lO 10 »o >o io lO >o »0 lO 1 C UJ IQ lo >o 

© A eq eb A ib cb t> oo 6i © A eq co A id cb A cb 6 

+ +++++++++++++++++++ 







i—i 

rH 

C 0 MC 0 « 0 <tfCftOTH<X>i-HrH 
rH CO (tq CO O CO rH 
rH CO rH 

670 

cq cq 

rH CO 

© 

co co 

L0 05 

CO CO 
-Hi rH 
rH 6 

rH CD 

Hi Hi 

Hi © 

00 t- 

co eq 

CO 00 
© cq 

6 cb 
id 

rH 

©o 

cq© 

©6 

y 

( 10 ) 

H<0®0 CD Cq CO IO rH Hi i-H CO t> O rH C-1 rH 
rH CO Hi CO H* cq rH 

274 

O H< 

» -Hi 

o io 
cq A 

CO 00 

O Cl 
■Hi t- 
rH cq 
rH O 

6 6 

co eq 

Hi Hi 
66 

CD ID 

CO t> 

Hi Cp 

eq 6 

CO Hi 
t> H 
© 6 

S 

r~T^r«0 H< rH H H< Cq 05 CO 
rH CO Oi (N O W5 <M 



0-61829 

0-17039 

io eq 
co <d 

ib A 

ID CD 

Hi c0 

CO CO 

eq rH 

6 6 
cq 

iH 

a o 
eq© 
6 © 

V 

5 

CO COOCOOCOCOCOCOlQCTiOOCOCOCOL'-TlilOfNOrH 

rHr-HOl^COCOlHOOeOCqrH 
rH rH 

700 

oo o 

•Hi co 

Hi T 1 

rH cb 

0-32135 

0-05026 

CO CD 

9 9 

rH <b 

05 CO 
rH rH 

o © 

Hi 05 
rH Hi 

eq id 

iH 
rH © 
O© 
©6 

. V 

P 

io « oo HiHiooeoiDoaeqoHicOi-iiooot-HiHi Pcs' 

rH rHCOCOOJC^^COasOiTHrHt-OlOCftCqCiqCq 

iHC0C0©C3l>H<rH 

5013 

2-260 

3-300 

0-09377 

0-04299 

' 

Hi 05 
rH 9 

■ A 6 

r~ co 
005 
rH 

H< eq 
cq Hi 
ib 6 
cq cq 

rH 

0-04 

<0-001 

© 

CO ^CO {0(35 OHint-MN10HlNfl)0(N(M(Ma)CO^ 

H rHC0COHiHi0qC5©©lDl>t>H<Cq rH 

rH CQ CO 05 05 I> CO rH 

o 

■Hi 

ID 

Hi 

2- 797 

3- 272 

0-07753 

0-05600 

1024-14 

964-95 

19-6S 

35-16 

o-os 

<0-001 

s 

rH<O©COO0Q0COl> rH Cq GO* 
rH CO t> O O CO rH- 
rH rH 

392 

4- 546 

5- 716 

0-07766 

0-05515 

120-21 

115-81 

3-59 

3-90 

0-31 

0-28 

s 

rHGOlOGOCOlOCOlOlNCq 
HlOrHCOCOCOCqOJCOrH 
rH H 05 t- CO 

2885 

05 00 

CO ID 
co 9 
cb ib 

0-14580 

0-06638 

rH H< 

Hi CD 

eq 6 
o eq 

O 05 
rH 

in O 

9 03 

6 A 

ID l> 

< 0-001 
< 0 001 

co" 

rHrH COCOt-t-eq ID eqt-lO03I>t-«H 

01rf<05 05Cqi>05'#rH 

H(MH 


t— 

in cq 

9 cq 
cb cb 

0-06530 

0-02999 

228-66 

215-52 

3-53 

6-76 

0-83 

0-46 

S 

rH rHr) © C5t-COt>10<MOCOlOCOCOOrH 
rH Ol H< © l- H* CO rH 


3-261 

5-363 

0-05806 

0-02747 

[- CO 

r- tr* 
on A 

r-> cD 

Hi CO 

H l> • 
6 6 

6 6 

3 

CO rH CO (M 10 05 I-H fh'S 

<n t- oo o o oo cq 
i-H co cq 

906 

5-188 

14-064 

0-08100 

0-02474 

r- c- 
co o 

6 cb 

005 
co cq 

2-22 

3-85 

00 co 
id cq 

6 © 

Central 

I value ic 

fH U> 

O <D ' 

O r ^05COC-COiO-rHCO«qrHOrHOqcO'^iOCOl--QOC50 ^ 

7J 1 | II 1 II I ! + + + + + + + + + +'3 

1 § 

Si 

’ >S 16 

■M M 

4 4 

o c 

« w 
>X\X 


















422 Method of frequency-moments 

Altogether fifteen observed distributions, which are both symmetric and leptokurtic, 
were fitted. They are reported in Table 8, and the sources from which they were drawn are 
given at the end of the paper. In some cases ((4) (6), (7) and (11)) two or more adjacent cells 
were combined where many groups were given in the original observations. Distribution 
(15) was reported in unequal class intervals. It is reproduced without alteration, hence the 
2 -scale of Table 8 does not apply to it. For the estimation of the moments and frequency- 
moments of (15) the coarser class intervals were split arbitrarily so as to make the calculations 
at all possible. The y 2 -test, however, was applied to the original grouping as shown in 
column (15). 

Distributions (l)-(5) were located by their mean statistics as their m’s all > 2-84. The 
median statistics were used for locating distributions (6) to (11) as m<2-84. Experiences 
(12)—(15) were orig inally reported without distinction between positive and negative 
deviations from the mean. It was assumed that their population means are zero. Con¬ 
sequently one further degree of freedom was allowed in testing for goodness of fit. 

For lack of space frequencies in some cases ((6), (7) aird (8)) falling into cells +10 and over 
and -10 and under were lumped together. In the actual computations, however, the original 
tail groups were used. The brackets indicate the tail groupings employed in the y 2 -test for 
both methods of fitting. 

In cases where m < 3 it was found necessary to compute more than three ordinates per 
group, at least in the centre of the distribution, in order to make the error of the quadrature 
formula negligible. 

It is well known that the y 2 -test for goodness of fit is exacting whenever N is very large. 
In such cases the probability P associated with an observed y 2 will often be < (M)5, although 
the actual fit seems to be quite good (Elderton, 1938, p. 204). For this reason it is better to 
compare the actual y 2 ’s as derived from the two methods of fitting instead of their respective 
probabilities. 

A comparison of the y 2 rows of Table 8 shows that the method of frequency moments 
yields a better fit in fourteen out of fifteen Type VII distributions examined. The reduction 
of total y 2 is substantial in ten out of fifteen cases. As the various cells used were identical 
for both methods, and as the amalgamation of tail groupings for the y 2 -test were kept the 
same, it is reasonable to assume that the almost all-round lowering of y 2 ’s is due to a 
reduction of errors of estimation. On theoretical grounds we have expected such a result 
previously. 

For distributions (6) and (7) Jeffreys (1939) has estimated the scale and shape parameters 
of a Type VII law by his modified method of maximum likelihood. A comparison between his 
method, the frequency and the ordinary moment methods is given below: 


■ Distribution 

Method 

Statistic 

\ 

Modified 

maximum 

likelihood 

Frequency 

moments 

Moments 

(6) 

For shape m 

2-710 

2-797 

3-272 


„ scale 6 

3-678 

3-591 

4-226 

(7) 

For shape m 

2-257 

2-260 

3-300 


„ scale 3 

3-295 

3-266 

4-823 




Herbert S. Siohel 


423 


Normally, we should not employ the method of frequency moments for the estimation of 
tr when dealing with a normal universe. There is, however, one exception first pointed out 
by Yule (1938). We are sometimes confronted with an experience which appears to be 
reasonably normal except for one or two outlying observations. If the material from which 
the sample has been drawn is normal, we overestimate the moments due to the dispro¬ 
portionate influence of the outlying observations. On the other hand, it is wrong to omit any 
observation unless we have evidence of a real blunder in determining or recording an 
observed quantity. 

It is in such a situation that we may prefer the frequency-moment method to the more 
efficient method of moments (for a normal population) as the former method weights the 
tail ends far less than the centre of an experience whereas the opposite is true of the moment 
method. 

As an example Pearson’s bright line no. 2 experiment (1902) may be quoted. Por the 
moment solution, we find 

b x = 0-00178, b 2 = 5-02565, 

which suggests a Type VII law. The fitted equation was found to be 

y = 100-88(1 + 0-036624a: 2 ) _3 ' 981 , 
giving the rather poor fit, P = 0-04. 


Table 9 


Central 

value 

ft 

Type VII 
(Mom.) 

Deviations 

Normal 

(Mom.) 

Deviations 

Normal 

(Freq. 

Mom.) 

Deviations 

■ +21 

1 







+ 19 

0 







+ 17 

0 

0-4 






+ 16 

0 

0-2 

- 4-3 


-2-3 


- 0-7 

+ 13 

0 

0-5 


0-1 


0-1 


+ 11 

0 

1-0 


0-4 


0-2 


+ 9 

0 

2-0 


1-4 


1-0 


+ 7 

3 

4-2 


44 


3-4 


+ 6 

8 

90 

- 1-0 

11-1 

-31 

9-6 

- 1-5 

+ 3 

31 

19-0 

+ 12-0 

24-0 

+ 7-0 

22-2 

+ 8-8 

+ 1 

36 

37-3 

- 2-3 

43-6 

-8-6 

42-6 

- 7-6 

- 1 

73 

64-6 

+ 8-4 

66-4 

+ 7-6 

66-8 

+ 6-2 

- 3 

76 

91-2 

-16-2 

82-8 

-6-8 

86-2 

-10-2 

- 5 

96 

99-1 

- 3-1 

87-2 

+ 8-8 

91-6 

+ 4-6 

- 7 

79 

81-7 

- 2-7 

77-1 

+ 1-9 

79-7 

- 0-7 

- 9 

60 

52-9 

+ 7-1 

66-7 

+ 3-3 

67-0 

+ 3-0 

-11 

30 

28-8 

+ 1-2 

36-0 

-5-0 

33-6 

- 3-6 

-13 

17 

14-2 

+ 2-8 

18-1 

-1-1 

16-2 

+ 0-8 

-15 

5 

6-7 


7-8 


6-4 


-17 

3 

3-1 


2-8 


2-1 


-19 

1 

1-6 


0-9 


0-6 


-21 

0 

0-7 

- 2-9 

0-2 

.-1-8 

0-1 

, ■+* 0’S 

-23 

0 

0-4 


0-1 




-25 

0 

0-2 






-27 

1 

0-3 

. 





P 



0-04 


0-44 


066 




424 


Method of frequency-moments 

For the method of frequency-moments we have 

a 4 = 1-06185. 

From Table 6 we see that m must be very far above 22. As the difference between a normal 
curve and a Type VII curve of m > 30 is very minute we should proceed to fit a normal curve 
by the method of frequency moments. This was done giving 

y = 92-57e-°-° 0!)9te2 , 

with the good fit P = 0-56. 

Had we known a priori that the sample came from a normal population we should have 
fitted a normal curve by the method of moments leading to the equation 

y = 88-27e-°' 0fl086z2 ! P = 0-44. 

The ori ginal observations and the resulting three fits are given in Table 9. 

Inspection of Table 9 shows that the frequency-moment fit is better than the ordinary 
moment solution as 

(1) Pis larger; 

(2) absolute deviations are smaller in ten out of twelve cells; 

(3) there are nine changes of signs as compared with six in the moment fit. 

The use of Hardy’s formula in the practical applications of the method of frequency- 
moments introduces certain inconsistencies with regard to the standard errors and efficiencies 
as derived in the theoretical part of the investigation. It is confidently felt, however, that the 
discrepancy between the practical and theoretical approach is small, just as, in practice, 
one works so often with Sheppard’s correction without taking it into account when estimating 
standard errors. 

It gives me great pleasure to express my thanks to Mr J. E. Kerrich for much helpful 
criticism, to the South African Council for Scientific and Industrial Research for permission 
to publish this paper, and to the staff of the Statistical Section for assisting me in heavy 
computational work. 


REFERENCES 

Elderton, W. P. (1938). Frequency Curves and Correlation. Cambridge University Press. 

Fisher, R. A. (1921). On the mathematical foundations of theoretical statistics. Philos. Trans. A, 
222, 309. 

Fisher, R. A. (1948). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd. 

Hardy, G. F. (1909). The Theory of the Construction of Tables of Mortality and of Similar Statistical 
Tables in Use by the Actuary. London: Institute of Actuaries. 

Jeffreys, H. (1938). The law of errors and the combination of observations. Philos. Trans. A, 237, 
231. 

Jeffreys, H. (1939). The law of errors in the Greenwich variation of latitude observations. Mon. 
Not. R. Astr. Soc. 99, 703. 

Jeffreys, H. (1948). Theory of Probability. Oxford: Clarendon Press. 

Kendall, M. G. (1945). The Advanced Theory of Statistics, 1. London: Charles Griffin and Co. Ltd. 
Pearson, K. (1902). On the mathematical theory of errors. Philos. Trans. A, 198, 235. 

Shewhart, W. A. (1931). Economic Control of Quality of Manufactured Product. New York: van 
Nostrand. 

Sichel, H. S. (1947), Fitting growth and frequency curves by the method of frequency moments. 
J.R. Statist. Soc. 110, 337. 

Yule, G. U. (1938). On some properties of normal distributions, univariate and bivariate, based on 
sums of squares of frequencies. Biometrika, 30, 1. 



Herbert S. Siohil 


425 


SOURCES OF DATA 

(1) TeeMAN, L. M. (1928). Intelligence quotients. From The Measurement of Intelligence. London: 

Harrap and Co. Ltd. 

(2) Nat. Inst. Pees. Res. Visual discrimination scores. Unpublished research data. 

(3) SheWhaet, W. A. (1931). Random noises. From Economic Control of Quality of Manufactured 

Product. New York: van Nostrand. 

(4) Miceelson, A. A. and others (1935). Light velocities. J. Astrophys. 32, 56. 

(5) Fleminger, M. E. Output deviations. Unpublished research data. 

(6,7) Holme, II. R. & Symms, L. S. J. (1939). Variation of latitude. Mon. Not. R. Astr. Soc. 99, 642. 

(8) HanSMANN, G. II. (1934). Distribution of third moment coefficients. Biometrika, 26, 129. 

(9) SichEL, II. S. (1947). Sampling experiment on Type VII population. J. Roy. Statist. Soc. 110, 

337. 

(10) Keeeich, J. E. Fortnightly gains of oxen. Unpublished research data. 

(11) Mills, F. C. (1938). Prices and commodities in 1927. From Statistical Methods. New York: 

Holt and Co. 

(12) Bond, W. N. (1935). Variation of judgement on the position of fuzzy object. From. Probability 

and Random Errors, London: Arnold. 

(13) Peaeson, K. & Mono, M. (1927). Distribution of tetrads. Biometrika, 19, 246. 

(14) Bessel, F. W- (1838). Errors of right ascensions. Astr. Nachr. 15, 358. 

(15) Peaeson, E. S. (1929). Student’s ratio for samples of two. Biometrika, 21, 259. 



[ 426 ] 


ON THE USE OE STUDENT’S 2-TEST IN AN ASYMMETRICAL 

POPULATION 

By S. G. GHURYE 

On account of the unique property of samples from a normal population that the ratio 

m+l 

(x-/i)J{n+i)ls (where ji is the population mean, x = £ *</(»+!) and ns 2 = ^(x-xf) 

i =1 

is the ratio of a normal deviate to a stochastically independent estimate of its variance, 
Student’s 4-test is a suitable test of significance for the mean of a normal population. However, 
in a variety of cases, it is necessary to test for the mean of a population which does not follow 
the Gaussian law. Efforts have, therefore, been made to see how far Student’s distribution 
may be used for the purpose in non-normal populations. Due, mainly, to the analytical 
difficulties of the problem, no extensive theoretical discussion has yet been given. Thus, 
Pearson & Adyanthaya (1929), Rietz (1939) and Nair (1941) have given experimental 
treatments, while the theoretical discussions of some others (Rider, 1929; Perlo, 1933; 
Laderman, 1939) have dealt only with trivially small sample sizes. The papers by Bartlett 
(1933) and Geary (1936, 1947) give results true for any sample size, though they are based 
on certain assumptions and approximations. The present paper deals with the population 
considered by Geary in his 1936 paper, subject to the same approximations. The second 
contribution by Geary (in which is derived the 4-distribution in samples from a population 
which departs more from normality than that considered in the 1936 paper) came to my 
notice too late to be made use of in the present work; but it is proposed to consider it later on. 

Geary (1936) has obtained the distribution of the ratio (x—fi)^/(n+ l)/s in the case of an 
asymmetrical population, whose fourth and higher cumulants are zero, by neglecting squares 
and higher powers of the third cumulant. We know from this how far the probability of an 
error, of the first kind (i.e. the probability of rejecting the null hypothesis when it is true) in 
such a population differs from that for a normal distribution, provided we may neglect the 
square of the standardized third cumulant y x . Here again, on account of analytical difficulties, 
it is not possible, except for very small sample sizes, to consider the effect of terms containing 
higher powers of y v However, we can assume the result derived by Geary to be correct for 
very small values of y v as also for large sample sizes—but in such cases the deviation from 
values of the normal theory is practically negligible. Even then, it is of interest to know 
whether, in using the usual tables of the 4-test (based on the normal distribution), we are 
committing the greater error in the probability of an error of the first kind or in that of an 
error of the second kind. In the present paper are derived the values of the probability of an 
error of the second kind (and hence, of the power of the test) when the usual 4-tables are used 
to define the critical region. 

It may be mentioned here that this problem is only a special case of a general investigation, 
on which the writer is engaged, into the effect, on statistical tests, of differences between the 
actual and the assumed distribution laws of the universe sampled. The solution of these 
problems is hampered by analytical difficulties in the derivation of the probability'laws 
(and particularly of power functions), and the present case is one of the few in which a mathe¬ 
matical, though only approximate, solution has been found possible. 



427 


S. G. Ghtjrye 


Expression for the power function 

Let the variate x have a mean /q, standard deviation < 7 , third cmnulant k 3 = ^cr 3 , and all 
higher cumulants zero; y 1 is assumed to be small, squares and higher powers being neglected. 
The distribution function of x is given by 

UJ_ n 

where £ = ———. Again neglecting y\ and higher powers, the probability of a sample of n + 1 
independent values x v x 2 , ...,x n+1 is given by 

(2zr)-W n + 1 >|l + ^ (Sg 3 —32£)Jexp (- £Z£ 2 ) d £ v .. dl- n+1 . 

The part not containing y 1 is the same as for the normal population. In what follows, we 
are assuming that the table of the 1-test based on the normal distribution is used for the 
significance level, i.e. for the probability of an error of the first kind. The contribution of the 
part not containing r y 1 is, therefore, the same as for a sample from 6 a normal population, for 
which the power of the Meat has been considered by Neyman et al. (1936) and again by 
Neyman & Tokarska (1936). Hence, we shall consider below only the additive part 


Yi 


~ 32 £) °*P (- P£ S ) • • • € 


n-f V 


(i) 


The joint distribution of x and s can be obtained by the substitution used by Geary (1936), 
and is found to be 

Yi 


dF(u, x) = 


where 


2 K»~2)fZLJ| )6j[2rr(n+l)] 


{u 3 + 3«^ 2 — 3(n+ l)u}% ,l_:l exp{-|(X 2 + w a )}ciwdy, 


( 2 ) 


u = 


(a:-/4 1 )V(w+ 1 ) 


and y 2 = 


E^-a;) 2 7is s 


a“ 


For testing the hypothesis /i = y 0 , the l-test is uniformly most powerful for the class of 
one-sided alternatives ji > /t 0 or /i<y 0 . Thus, to test the hypothesis ji = y 0 (or <y 0 ) against 
the set of alternatives (i > we find the value t of Student’s ratio such that the probability 
of exceeding it is a predetermined a (i.e. the value of t in Fisher’s tables corresponding to the 
probability 2a); we then reject the hypothesis y < /i g if the ratio (x—/! g )^(n + l)/s exceeds t, 
and accept it if it does not. In both these decisions we may be wrong in that we may decide 
that fi > /t 0 , when, in fact, /i ^ /t 0 (error of the first kind), or we may accept fi </4 0 even when 
ft = ft x >/i n (error of the second kind). From Geary’s paper we can calculate the difference 
between the actual value of the probability of an error of the first kind and the value a 
assumed. To obtain the corresponding correction for the probability of an error of the second 
kind (and hence for the power of the test), we have to integrate the expression (2) over the 


region of acceptance, i.e. over the domain conditioned by 


(x-fi 0 )\J{n+l) 


^ t, 


i.e. by 


, xt 


where p n = —— —*J(n + l) and t is a function of a. 


Then the probability of an error of the second kind is 

P n («, », Pn) = P Il(«- n ’ Pn) - Yl P u( a > Pn): 


(3) 



428 On the use of Students t-test in an asymmetrical population 

where P\i is the P n for the normal population and - P\j ia the contribution of the terms 
containing only the first power of y v The power function (1 —Pn) is then (1 — Pii + y 1 P\ s ), 
so that yj P\i is the correction to be added to the ‘ normal-theory 5 value. Then 


P\i = 


-1 


2«»—a) 


n— 2 


Q<J\2n(n+l)] 


’00 />(( 

Jo. - 


C WVn)-f« 


{ u 3 + 3 uyf — 3 (n +1) u} x n_1 

exp {-Ux 2 + u 2 )}dudx. 


Now, the double integral 


I 


expl-ix 3 )^” 1 


[1 


’UxIVn)-pn 


[u 3 + 3 uyf — 3(n+ l)u} exp (— \u 2 ) dii\ dy 

= - [ {2 + u 3 + 3(x 2 - n + 1)} x n ~ l exp {- + v%)}d%, where u 0 = ^--p r 

J 0 

= “ — ^v? + 4^ + B)(io+b )*-iexp(-iw 2 )dto, 

inwhich a 2 =l + -, 6=-ff-, A=a 2 + 2 and P = (3a 2 -2)^ 1 -(3?i+l)a, 2 . 

n a\jn v ' a 2 ' ' 


Hence 


where 


-Pii — 
1 1” 




2«»-2)^_J? | j6V(w+l)a(»+ 2 ), 


( 4 ) 

( 5 ) 


A - 7 ^ 2 ^) J 6 (^ w2 + 4£>w 4- B) (m> + 6)’ l ~ 1 exp (- -|w 2 ) dw. 

Thus, the problem of finding the values of yiP\x, the contribution of the first power of 
y 1 to the power function, reduces to that of the evaluation of J n . 

Evaluation of P\ x 

By expanding the binomial in (5), J n can he expressed as a linear function of n + 2 incomplete 
T-funotions. However, the best method of evaluating J n , at least for the values of n con¬ 
sidered below, seems to be that based on a reduction formula. 

Kn = i_„ (w+ b)n ex P (- 2t " 2 ) *°» ( 6 ) 

so that A = a K +i - 2 atoK n .+ {pi -3 n- 1 )<PK n _ v (7) 

From (6), we get by partial integration 

K n = 6ff n _ 1 + (n— 1) K n _ 2 . (8) 

Denoting by / the normal probability-density —r exp (— -|6 a ) and by F the distribution 

i ph y\ I 


function 


V(2 


1 p 

W- 


ex P ( — b J ' 2 ) du, we have K 0 — F and K 1 = bF +/, from which successive 

( 9 ) 


IPs can be evaluated by using (8).* 

Then, from (7) and (8), ^ = lKn + mKn _ v 

where Z = 6(2 —a 2 ) and m = 2n+(pl l —2n—l)a 2 . 

. * note£ t that K n =n \ Hh n { — 8)//(2tt), where Hh„ is the function tabulated (for n<21) 

m Table XV of the British Association Mathematical Tables, Vol. I, I am obliged to Dr Harold 
Hotelling for pointing this out to me. 



S. G. Gktjrye 429 

The values of P\i(a,n,p n ) were calculated, by way of illustration, for a = 0-05 (Table 1) 
and a = 0-01 (Table 2), n = 4, 9 and 19 and all integral values of p n for which the value of 
Pji exceeded 0-001 in absolute value. Geary (1936) has tabulated the values of his correction 
to the distribution function for these values of n. The value of - P\ L for p n = 0 must, of 
course, agree with that in Geary’s table for the same n and the appropriate t. The accuracy 
of the results is conditioned by the accuracy of the value of t obtained from the probability 
table, but the effect of subsequent errors due to rounding off has been rendered negligible 
by retaining six or seven significant figures in the calculations. Under the circumstances, the 
results given should be correct, except for the effect of rounding-off errors on the fourth place 
of decimals. 

Table 1, Pjifor a, = 0-05 


\f>n 
n \ 

0 

1 

2 

3 

4 

5 

6 

4 

, 9 

19 

— 0-0343 
-0-0297 
-0-0229 

-0-0775 
- 0-0628 
-0-0450 

— 

-0-0361 
-0-0046 
+ 0-0048 

+ 0-0559 
+ 0-0597 
+ 0-0445 

+ 0-0649 
+ 0-0340 
+ 0-0191 

+ 0-0260 
+ 0-0062 
+ 0-0025 

+ 0-0051 


Table 2. P\ifor a = (Mil 



r 

0 

1 

2 

3 

4 

4 


-0-0418 

-0-0771 

-0-0619 

+ 0-0066 

9 

-0-0115 



+ 0-0012 

+ 0-0731 

19 

-0-0098 


-0-0523 

+ 0-0270 

+ 0-0569 


\ 

\.Pn 

n \ 

6 

6 

7 

8 

9 

4 

+ 0-0616 

+ 0-0646 

+ 0-0387 


+ 0-0045 

9 

+ 0-0496 

+ 0-0130 

+ 0-0016 

— 

— 

19 

+ 0-0207 

+ 0-0026 

’ 




Effect on the power function 

Following Geary’s 1936 paper, it has been assumed that the y 1 of the population in question 
is sufficiently small for our purpose; but it has not been found possible to define the range of 
permissible values of y x . It is proposed to consider this question, as also that of improving 
the approximation to the power function, in a further publication. 

In order to get an idea of the magnitude of the effect of departure from normality, the 
values of the power for y± = 0 (normal distribution) and y 1 = 0-4, n = 9 and 19 and some 
values of p n are given in Table 3, assuming that the effect of higher powers of y x is negligible. 
In this respect, it may be mentioned here that on the wider assumption used in Geary s 












430 On the use of Student’s t-test in an asymmetrical population 

1947 paper the actual probability of t lying to the left of the lower 2-5 % point of normal 
theory'is 0041 in a sample of 10 from a Pearsonian population with y 1 = 0-5 and y 2 = 0, 
whereas, according to the assumption on which the present paper is based, the same 
probability is 0-035. Itlias, therefore, been assumed for the present that the approximation 
used in the present paper also gives a fair estimate of the power function when y 1 — 0-4. 


Table 3. Comparison of the power when y 1 = 0-0 and y 1 - 0-4 


a=0-05 

Pn 

a= 0-01 

n- 

= 9 

n= 

19 

n= 

9 

n= 

= 19 

ri=o- - o 

y 1 = 0-4 

o 

o 

II 

rH 

71 = 0-4 

o 

o 

II 

C^* 

o 

II 

H 

ll 

o 

6 

7! = 0-4 

o-oco 

0-038 

0-050 

0-041 

0 

0-010 

0-006 

0-010 

0-006 

0-236 

0-211 

0-248 

0-230 

i 

0-071 

0-051 

0-081 

0-063 

0-580 

0-578 

0-612 

0-614 

2 

0-268 

0-238 

0-320 

0-299 

0-868 

0-892 

0-894 

0-912 

3 

0-587 

0-587 

0-677 

0-688 

0-979 

0-993 

0-986 

0-994 

4 

0-853 

0-882 

0-916 

0-939 * 





5 

0-969 

0-989 

0-989 

0-997 


The effect of positive skewness is to decrease the power in the region of less power and 
increase it everywhere else, the opposite being true of negative skewness. It is seen from 
Table 3 that the change in the power is not such as to affect materially the inference drawn 
from the test. How far this is so for greater departures from normality, it is not possible to 
determine by the present method. 

Conclusion 

It has thus been found possible to obtain an expression, though approximate, for the power 
of Student’s i-test applied to samples from an asymmetrical universe when the critical region 
has been determined on the erroneous assumption of normality in the parent population. 

The results derived are true if the fourth and higher cumulants are zero and the stan¬ 
dardized third cumulant sufficiently small; subject to these restrictions, the change in the 
power of the test is found to be negligible as far as the inference to be drawn is concerned. 


My thanks are due to Dr N. R. Tawde of the Royal Institute of Science, Bombay, and 
Prof. G. S. Priolker of the Wilson College, Bombay, for allowing me to use their calculating 
machines. 

REFERENCES 

Bartlett, M. S. (1935). Proc,. Oamb. Phil. Soc. 31, 223. 

Geary, R. 0. (1936). J.R. Statist. Soc. Swppl. 3, 178. 

Geary, R. C. (1947). Biometrika,^ 34, 209. 

Laderman, J. (1939). Ann. Math. Statist. 10, 376. 

Nair, A. N. K. (1941). Sankhya, 5, 393. 

Neyman, J. ei al. (1935). J.R. Statist. Soo. Suppl. 2, 107. 

Neyman, J. & Tokarska, B. (1936). J. Amer. Statist. Ass. 31, 318. 

Pearson, E. S. & Adyanthaya, N. K. (1929). Biometrika, 21, 259. 

Perlo, V. (1933). Biometrika, 25, 203. 

Rider, P, R. (1929). Biometrika, 21, 124. 

Rider, P. R. (1931). Ann. Math. Statist. 2, 48. 

Rietz, H. L. (1939). Ann. Math. Statist. 10, 265. 




[ 431 ] 


TABLES OE SYMMETRIC FUNCTIONS—PART I 


By F. N. DAVID and M. G. KEND ALL 


1. Symmetric functions have important applications in the theory of probability and 
the theory of sampling. There are four types of symmetric function in use, and we shall also 
employ a fifth. They are as follows: 

(a) The monomial symmetric functions typified by 


{Pl'Pi* • ■ • PD = Baf'af 1 ... x%>x?‘. .. (1) 

where the summation takes place over all suffixes i,j, ... s q,r, ...,u,v which are diff erent The 

5 . 8 
number 2 tt^ = w, say, is called the order of the symmetric function and 2 pj — w > say, 

■j—i j-=»i 

is called its weight .. There arc terms in the summation on the right in (1) where n is the 

number of possible different suffixes. 

(b) The unitary symmetric functions (l r ), denoted by a,.. We have, from equation (1), 


«,= (P) = %x i x j ...x v , (2) 

where the subscripts are all different and there are r of them in any one term of the 

sum. a r may also be defined by the identity in t 

T = {t-x r )(t-x z ) ...(t-x n ) = + ...; (3) 

(c) The one-part symmetric functions or power-sums, defined by 


S,. = (U = 2 N) 

i~l 

These also are a special case of (1). ' 

(d) The homogeneous product-sums defined by 

y = 1 + hit +... + h r t’ + 

we shall not use these functions. They are mentioned for the sake of completeness. 

(e) The “augmented” functions [.Pi 1 • • ■ p™] = (Pi'■■■ Ps s ) 7T i- ffir- ( 5 ) 


Bor many statistical purposes these are more convenient than the ordinary monomials. 

2. The tables below show the product of power-sums of weight w in terms of the augmented 
symmetries and vice versa, up to and including weight 12. Each table is given the number 1 
(because further sets of tables are contemplated) followed by its weight, e.g. Table 1.5 
relates to weight 5. To express the power-sums in terms of augmented symmetries the 
tables are read horizontally up to and including the diagonal, the unit entry in which is 
shown in heavy type. Eor example, from Table 1.5 

(2)2(1) = [6] + [41] + 2[32] + [2*l], (6) 

The augmented symmetries are given in terms of power-sums by reading thp tables vertically 
from the top, again up to and including the diagonal. For example, from Table 1.5 

[31 s ] = 2(5) - 2(4) (1) — (3) (2) + (3) (l) 2 . (7) 


Biometrika 36 


28 



432 


Tables of symmetric functions 


Uses of the tables 

3 . A minor use of the tables may be noted: The expression of (1 ) r in terms of the augmented 
symmetries gives the expansion of moments about an arbitrary point in terms of cumulants; 
e.g. from the bottom line of Table 1.5 we have 

ji' s = k s + 5X4^ + 10^3X2+10x3^+16x1^+10^^ + k\. (8) 

If, of course, we can accept moments about the mean we ignore any term containing a unit 
P art; e ' g ‘ //-g = Ks+iO/Ca/Ca. (9) 

4. In statistical investigations concerning sampling moments we often need the expecta¬ 
tions of products of power-sums. These can be written down at sight, for samples from an 
infinite population, from Table 1 when it is remembered that 

£[pi' ■ ■. p?} = ^ (/ Pl ) m ■ ■ ■ (/„)", (io) 

where = n{n~ 1)... (n — m + 1). 

Thus, from (6), 

£{{ 2) a (1)} = nfi' 5 + n{n~ 1) yif-i'x + 2 n{n -1) + n{n -1) {n - (11) 

5. The same kind of procedure gives the multivariate cumulants in terms of multivariate 
moments about an arbitrary point. For example, corresponding to (6) we have 

y'{2 2 \) = /c(2 2 l) + /c(2 2 )/c(l) + 2/c(21)/c(2) + {k(2)} 2 /c(1). (12) 

Conversely, from (7), 

k(31 2 ) = 2^.'(3.){/j'( 1)} 2 — 2/.6 , (31)/i'(l)—/t'(3)/i'(l 2 )+/i'(31 2 ). (13) 

Equations of type (13) are particularly useful. 

6. Table 1 may also be used to express /c-statistics in terms of the one-part symmetries 
(r), namely, in terms of the power-sums which are used to calculate them in practice. The rth 
^-statistic is the symmetric function of the sample values whose expectation is equal to the 
parent cumulant of order r. Thus, corresponding to the inverse of (8), we have 

Tc _.[5] TO 10[32] 2Q[31 2 j 

6 n n(n— 1) n(n— 1) n(n- 1) (n~ 2) 

30[2 2 1] • 60[21»] _ 24[1»] _ 

n{n— l)(n~2) n{n— l)(n— 2)(n — 3) n[n— l)(7i — 2){n — 3)(n — 4)’ 

Substituting for the augmented symmetries by power-sums written as s n from Table Id, 
and collecting terms, we get 

K = ^ {(w 4 +5 n 3 ) s E - 5(w 3 + 5 n 2 ) 5 4 s x - 10(w 3 - n 2 ) s 3 s 2 + 20 (n 2 + 2 n) s 3 sf 

+ 30(w 2 — n)slsx— 60ras 2 ,sf-l-24si}. (15) 

7. The tables provide a method of evaluating the sampling cumulants of 7c-statistics 
(or of any symmetric functions of the observations) by the direct evaluation of expectations. 
The alternative combinatorial method (Fisher, 1929; Kendall, 1941) is shorter in performance 
but needs careful handling in view of the ease with which certain combinations can be over¬ 
looked. Kendall (1941) lists most of the cumulants of ^-statistics up to order 12 but does not 
give k( 3 3 2). We will sketch the derivation of this quantity as an illustration of the use of the 
tables. 



F. N. David and M. GL Kendall 
By relations derived as in § 6 we have 
n 


433 


*S K 


_(n — 


1) (n — 2) 
1 


3^,2^ 

oq r —r~ 

n 'i 


'J^( s „ s 2\ 

ra-l\ S2 nj 


' n\n- 1)* ( n - 2)jW s r ^ e Gs«i+ 

+ n 5 (154s 2 4 + 27 s s 4sl) - w d (6s|«x + 63s 3 sf4+ 274sf) 

+ 77 3 (48s 3 s 2 4 + Slslsl) - 77 2 (12s 3 4 + 9Qsjsft + w(44s a s?) - S^ 1 }. 


(16) 


We take expectations of both sides. The terms on the right may be written down in terms of 
parent /t’s after the manner of § 4. These /i’b are then transformed to parent k’s by the known 
relations connecting them, for example, those of § 3. The result gives us /d(3 3 2) and the corre¬ 
sponding k is then derived from the equation 

k(3 3 2) = /(3 3 2) - 3/c 3 x( 3 2 2) - x 2 x(3 3 )- 3/c(3 2 ) x(32) - 3x(32) k\- 3x a x 3 x(3 a ) -k\k 2 , (17) 

which is itself derived from the tables as described in § 6. The ks of lower order occurring in 
this expression have already been tabulated, and we find, after carrying out the necessary 
substitutions and reductions 


x(3 3 2) = ^ + 


45 


n 3 n\n— 1) 


^9^2 + 


9(17n —26) 
n\n-Y? 


K&Ki 


27(ll77 2 -3l77 + 2 2) ^ 9(49 t7 2 - 134 77 + 103) 

n 2 (n— l) 3 ^7^4+ ^{n— l) 3 K * Kt 


+ 


54(12?i-23) . 54(63re 2 - 22077+178) 

77(77 — l) 3 (77 — 2) 


x 7 xi+- 


+ 


+ 


77(77— l) 2 (77 — 2) 
54(93?7 2 —34077+316) 


k 6 k 3 k z 


*5*4*2 + 


54(7lT7 a —. 421772+84277 - 564) 


Kr, K% 


77(?7—l) s (?7—2) I 77(77 - 1)2 (77-2)2 

108(41t7 3 — 257t7 2 + 64377 — 390) 2 _ , 108(33ti 2 - 12277+110) _ _ 3 

/ 1 / r\\0 *4*3 ^ / -i\o/ c\\0 **ft *S 


77(77— l) 3 (77— 2) 2 


*4*3*1 + 


, C48(29t7 2 - 12677+ 137) 

+ (77-l) 3 (77-2) 2 ****** 

, 129677(577- 12) 4 

+ (77-l) 3 (?7-2)2^^' 


(?7 — l ) 3 (?7 — 2)2 

324(2977 2 -13677+164) 3 

( 77 - 1 ) 3 ( 77 - 2) 2 ■*“** 


(18) 


Method op constbcjction op the tables 

8. That part of the tables which express the product of power-sums in terms of the 
augmented symmetries (the part below the main diagonal of the tables) was constructed by 
building up for a given weight from the lower weights. For instance, the expression of 
(3) (2) (1) in Table 1.6 may be derived in three ways: by multiplying (3) (2) by (1); by 
multiplying (3) (1) by (2); and by multiplying (2) (1) by (3). From Table 1.3 we have 

(2)(1) = [3] + [21]. 


28-2 



434 Tables of symmetric functions 

Hence (3) (2) (1) = (3) [3] + (3) [21] 

= [6] + [3 2 ] + [51] + [42] + [321], (19) 

Again, from Table 1.4, (3) (1) = [4] + [31]. 

Hence (2) (3) (1) = (2) [4] + (2) [31] 

= [6] + [42] + [51] + [3 2 ] + [321], 

agreeing with (19). 

9. Each line was checked, in this manner by being calculated in two different ways by one 
of us (M.G.K.) and was independently cheeked by the other (E.N.D.) by symbolic operation. 
An y product of the one-part functions may be expressed as a sum of monomial symmetries; 
for example, 


(3) (2) (1) = A(6) + £(81) + 0(42) + E{ 3 2 ) + F( 41 2 ) + 0(321) + H{ 2 3 ) 

+1(31 3 ) + J(2 a l 2 ) Jf (21 4 ) + L(l 6 ), (20)' 


where A, B, etc., are positive constants to be determined and we do not use D because we 
require it for a different purpose. In fact, using MacMahon’s D-operator technique we have, 
for example, 

1 D 6 (3)(2)(1) = 1=A, 

all other terms vanishing. Similar operations with D S D V 1) 4 jD 2 , D\, D a D 2 D v etc., give 

A = £=0=0= 1, E = 2, 


and all other terms zero. Conversion into the [ ] functions then gives equation (19). 

This was the method of check actually used. It may be noted that any table of given weight 
can be built up more speedily from those of lower weight by a similar process, e.g if we operate 
on equation (20) by H, we get 

(3) (2) = B(5) + E(41) + H(2*) + /(21») + K(l*), 

and the values for the. coefficients may be read off from the table for weight 5. Operation 
by gives 

(3) (1) = (7(4) + (7(31) + H( 2 2 ) + /(2P) + K{\% 
t 

and the values of those coefficients not given by weight 5 are found from the table for 
weight 4. This method was not used as a check because it only repeats the process described 
in § 8 and was therefore considered to be insufficient. 

10. The complementary part of the tables, expressing the augmented symmetries in terms 
of power-sums, was constructed by inverting the relations below the main diagonal. Consider, 
for instance, Table 1.8. Starting in the cell with row (7) (1) and column [71] we complete the 
column [71]. We then proceed to the cell in row (6) (2), column [62], and wor kin g upwards, 
complete the column [62]; and so on from column to column. A check is provided by the fact 
that the number in the top row (8) must be (— l)? -1 (p — 1)!, where p is the number of parts in 
the item at the head of the column. There are various other relations between the numbers 
which act as a check on any doubtful figure. 



F. N. David and M. G. Kendall - 435 

11. Again, an independent check was applied. For example, analogously to equation (20) 
we have 

(321) = A'{ 6) + B\ 5) (1) + C'( 4) (2) + E’( 3) 2 4- F'( 4) (l.) 2 + C?'(3) (2) (1) +H’( 2) 3 

+ F(3) (l) 3 + J'(2) 2 (1) 2 + K'(2) (1)* + L'(l)«. (21) 

By carrying out a series of operations, each time getting zero on the left-hand side, we obtain 
a set of equations which may be solved for the constant coefficients. This may sound formid¬ 
able, hut in actual practice so many of the coefficients are zero and so many short cuts may 
be devised that the procedure was not found unmanageable. 

12. The present tables are complete for weights of 12 or lower. Complete tables for higher 
weights would involve a lot of additional labour and more printing space than their usefulness 
would justify. It may be noted, however, that expressions for higher weights can be obtained 
from those for lower weights, when necessary, by the symbolic method exemplified in § 9. 
A similar method can be used for the‘converse relations. 

13. We should like to congratulate the Press on the way in which the tables are 
arranged and on the uniform excellence of their type-setting. 

REFERENCES 

Fishes, R. A. (1929). Moments and product-moments of sampling distributions. Proc. Lond. Math. 
Sac. (2), 30, 199. 

Rend ail, M. G. (1941). The Advanced Theory of Statistics, 1. 4t/h edition, 1948. Charles Griffin and Co. 
MaoMahon, P. A, (1915). Combinatory Analysis, 1 . Cambridge University Press, 


Table 1.2 Table 1.3 






















Tables of symmetric functions 


Table 1.7 
































































F. N. David and M. G. Kendall 


437 


Table 1.9 



















































































































































































440 


Tables of symmetric functions 



Table 1.11 









F. N. David and M. G. Kendall 


441 


Table 1.11 ( cont.) 






















































Tables of symmetric functions 


















































































































































444 


Tables of symnetric functions 


Table 1.12 













I 59 6 o 

5 ° 4 ° 

3780 

10710 

5544 ° 

18480 

13860 

41580 






































































































R N. David and M. G. Kendall 


Table 1.12 ( cont.) 





















































448 


Tables of symmetric functions 


Table 1.12 (cant) 




























































[ 450 ] 


MISCELLANEA 


On the efficiency of the method of moments and Ney man's type A distribution 

By L. R. SHENTON 


In a paper by Neyman ( 1939 ) several types of compound Poisson distributions are derived and their 
application to empirical data mentioned. Neyman remarks that the method of fitting requires investiga¬ 
tion. It is the object of this note to consider the efficiency of the method of moments and Neyinan’s 
type A distribution of two parameters, 

The efficiency of the method of moments, with particular reference to parameters of scale and location, 
has been discussed by R. A. Fisher ( 1921 ), in connexion with the Pearson system of frequency curves. 
More recently Fisher ( 1941 ) has used the covariance and information matrices to find an expression for 
the efficiency of the method of moments applied to the negative binomial distribution. The chief difficulty 
appears to be the evaluation of the information determinant, and the process given here may have 
applications in other cases. 

1. Nbyman’s type A distribution 


The probability function is given by 


exp {—1 — = S P a f*, 

x -0 


with 

where 


P,= 


e~ mi m 2 “ 


xl 


A l* A a 2* A ,3 3 * 

0*4 -1-1--h 

11 21 3 ! 




The cmnulants K, are given by 


A' = ni 1 e~ m \ 0 * = 1, x = 0, 
= 0, x ^ 0. 


£ K r t T lr\= £ 1)*/«1. 

r=l s=l 


The first few are recorded for later use 


( 1 ) 

( 2 ) 

(3) 


= ffjjWs, If 2 = m x ??J 2 (l +m 2 ), 

K 3 = m 1 m 2 (l + 3m 2 +ml), 

K t = m 1 m 2 ( 1 + 7 w 2 -(- 6ml +m\), 

K 3 = ?%to 2 (1 + 15m 2 + 25TO 2 -(- 10m 2 + w 2 ), 

If 9 = m l m t {l + 31m 2 + 30ml + &5ml + 15mi+m^), - 


The relations 

K r +1 — WijSr + r ^r-l 4* g &r —2 + ■ • ■ 4 tK 1 + 7% j 


[ dK r ) 

and 

K r+l = m 2 {K T+ ^, 


(*) 

(5) 


are easily proved from ( 3 ), the second expression being useful if high-order cumulants are required. 
By the method of moments nq and m 2 are estimated from 

= m x n i 2 , A a = m,m 2 (l + m 2 ), 

where Aj and A 2 are the first two sample moments about the mean. For large samples of n, we find 

n var 2+ m\ + 2 1 + m 2 ) 2 }/m 2 , 

«,varm 2 = {2 + m 2 + 2m 1 (l + m i ) x )/m 1 , 
ncov^.mj,) — — 2{l-t-m l (l + m 2 ) 3 }/m 2 , 
so that the covariance matrix 


has a determinant of value 


var 

_cov (m^ m 2 ) 


oov(m l m t ) 

varm 2 




{l + (l+m 2 ) 3 + 2m 1 (H-m J ) a }/»i !1 m 2 . (6) 

To find the efficiency of fitting the first two moments, we require the determinant of the information 
matrix. > 



Miscellanea 


451 


2. Likelihood equations and information matrix 
By differentiating the g.f. (1) we find 

= {*P.H*+ 1 ) ■?.«}/»»* and ^ = -P a +^' 1) D 

° m t 8m t ” mj 

The likelihood of the sample (n 0 ,n t , .... n„) is 


«+z* 


(7) 


and so for optimum statistics irl^ m a 


L = n (PJ*. 
2=0 


+ 1 ) = nmjh^ — nX v 

■* « 

with S' indicating summation over the sample. i 2 is therefore efficiently estimated by the mean, and 
to complete the solution we must approximate to 


X'nja> + l)-gi = nJ v 


( 8 ) 


If and w a are estimates by moments, they can be improved by (8). Formim,, = rh^fh, t = A 1( and writing 
P a (m 2 ) to indicate P tt with m x — A x /»i 2 , we have 


s^a+i) a p ^ ) = j, SX(a+ L) ^a_(.”H+i) s f5+ W 

dm t P x (m 2 ) m 2 P„ m| \ P x Pw 


■ (9) 


Thus if 


then 


F(m 2 ) = 2 ,'n x (x + 1) — nA x and m 2 = m i + Sm a 

<fr» 2 = — F(m t )/F'(m t ), 


whore (9) gives F'(m 2 ). The improved solution is therefore obtained by the use of frequencies calculated 
in the moment solution. 

We now proceed to consider the efficiency. Using equations (7) and (4) we have 




“Witn,» 


K~S logP *) =E ikW =- l+ ^< 

®(-^ logP °) = = 1 +»h-^«s 

e (-Li 1osP °) = = 5 - Wi(1+Wi)+5 ^ = 

where 0 = E{(x + l)* Pl +1 /Pl}, 

and for the information determinant 

"h ™ 1 I _ n 2 {(l+«i 2 )^-m 1 m 2 (m 1 +m 1 m 2 + TO 2 )}/TO 1 wit 


m % m z 


( 10 ) 


Since 


3. The evaluation of <j) = P{(:e + I) 2 Pb +1 /P 2 } 


P x = e-«"^{ 0 “ + 

£Ci 

^ m 2 


A'l* A' 2 2® A' s 3* 


1 ! 2 ! 


-4~ 


3! 




—- (4 0 + +A 2 x(x -1) + A s x(x - 1) {x - 2) + ..,}, 


x\ 


where the .d’s may be determined by the Gregory-Newton formula of interpolation, it is evident that we 
may set up orthogonal polynomials with respect to P a , defined by 


2 0 ,(*) <?,(*) P„ ~ 0, r =t s, 
0 

=t 0, r = s, 



■ Miscellanea 


with 0 0 = 1 , say. With these polynomials we may then find an expression for (ro + 1 ) P x+1 in the form 

{B^P BiO^x) P B 2 d 2 (x)-1-•••} P xt ( 11 ) 

where the B’ s are functions of m x and m a . In fact we have 

2(0 + 1) P a+1 0 T {x) = B r 2 0fc) P. ■ 

S quaring ( 11 ) and summing, we have 

c 6 = S( a) + l) 2 5 ;L1 = B§ 2 P a +B?S^( : B)P a +BlS^(c S )F a +BlS^( ii: )P 0 +.... ( 12 ) 

0 P<e 

This is a series of positive terms, the first two of which amount to m 1 ml[m 1 {l+m 2 ) +w. 8 ]/(l +m J ), and 
thus our expression ( 10 ) for the determinant of the information matrix is a series of positive terms. The 
first term of significance turns out to be the fourth in (12). The value of (12) can therefore he found in 
two stages; (a) the determination of the orthogonal polynomials, (b) the evaluation of the B’s. 

(a) The evaluation of the 0’s will be ill ustrated by finding d 2 (x). It appears most convenient to assume 

0j(as) = (b-Aj) 3 —/t 8 +4[(a:-A 1 ) 2 -/t ! ] + .B[a;-A 1 ], 

where A x is m t m 2 and the ji 's refer to the moments of P m about the mean. The orthogonality conditions 
leacl t0 -^a(*) + (*-A 1 ) s -/«»+^.[(a!-A 1 ) 2 -/tj] + B(a;-A 1 ) = 0,'i 


Pi +A(i % +Bfi 2 = 0 , 

ps~ Pa Pa + ^[/ 4 4 ~/4] +B/ 1 3 -0, 

-2 +A[/i i -p 2 p l ] +B/ij =0. 

The moments in (13) may he replaced by oumulauts, using well-known formulae 

If 2 — fir,, K.% = fir, K ., 3112 — pi! 

K r , + 10K 2 K 2 = /i s , K t +16 KJl 2 + 10KI + 161$ = A , 

etc., and these in turn expressed in terms of m x and m 2 by means of (4). 

( 6 ) For B s we have B, S 61(a) P a = S (a; +1) P x+1 0 a {x), 


so that we require the value of 


B 3 S fll(m) P„ = £ (* ■ + 1) P xu 0 3 {x), 
0 0 

£ (x - A x + Aj) 0 2 (x -1)P„. 
o 


Expanding and simplifying, this becomes 

Pi ~ 3/tj + 3( 1 — A x ) /t a — A x -f A[/i 2 — 2 //. 2 + Aj] + B[fi 2 —A x ]. 
Hence from (13) we have — 3/( s + 3(l —AJ^jj-Aj — 2/tj+Ai -A x 

/^4 /^8 /^2 

BS 1 flS(») p* - -r---ffs_ 


3/t a + 3(1 A x ) ji 2 A x 

P4 

Pi ~~ PiPl 

~~ 2/1 o + A, 

pi 

pi~pl 

— Ax 

Pa 

Pi 

pa pa 

Pi 

Pa 

Pa 

pi~pa pa 

Pa ~ Pa Pi 

Pi Pa 

Pi 


Pt~ p\ 

Pa ~ Papa 

Pi 


_ 4OT 1 TOl[l+m 1 (l+m z )(4 + w 8 )] a 
[2 + 2m 2 +m 2 + 2m x (l + m 2 ) 3 ]a ’ 

■where a = 12 + 12m 2 + 6 m| + 2m® +m x (48 + 144 to 2 + 144m§ + 80m 3 + 26mJ + 2m 3 ) 

+ 24m|( 1 + to 2 ) 3 (2 + 2 m a +m 2 ) + 12 mf (1 +m 2 ) 5 
For the earlier polynomials it may be verified that 

0 O (*) = 1, B6l(x) P x = 1 , = 

d x (x) = x-m ] m 2 , BlT,0l(tc)P„ = m 1 m%(\+m !l ), 

a , . .. (1 + 3 m.+ml), 

e 2 (x) = -— 


B»EflK«B)P.= 


(1 + m 2 ) [1 + (1 + m 2 ) 2 + 2 m x ( 1 +m 2 ) 3 ] ’ 



Miscellanea 453 

Inserting these values in (10) and multiplying by the covariance determinant (6), we have for the efficiency 

1 i . 4m a (l+m 2 )[l + M 1 (l4-m i ,)(4+m 2 )] 2 

_>l + : ---. 

where a is given by (14). 

Table 1 gives an upper bound to the value of percentage efficiency, E, for various values of m 1 


Table 1 


m l 

m 2 \ 

0-1 

0-5 

1-0 

2-0 

3-0 

4-0 

6-0 

10-0 

0-1 

96 

93 

93 

93 

94 

95 

96 

97 

0-2 

92 

88 

87 

88 

89 

91 

93 

96 

0-5 

82 

76 

77 

80 

S 3 

85 

89 

92 

1-0 

73 

67 

69 

76 

80 

83 

87 

91 

2-0 

66 

61 

67 

76 

81 

84 

88 

92 

3-0 

62 

60 

67 

77 

82 

85 

89 

93 

5-0 

59 

59 

68 

79 

84 

87 

91 

94 


and m a . It is clear from Table 1 that if m a is small, say less than 0-2, then E is generally in the region 
of 90 % or above, For 0-2 <m a < 1-0, E may be as low as 70 %, but otherwise it lies between 75 and 
90 %, and for these values it is not easy to decide whether the ‘improved’ fit is worth the additional 
computation involved. If m 2 Js 1-0 and< 3-0, E may be less than 70 % and in this case the maximum 
likelihood approximation (9) can bo applied. 


4. A NUMERICAL ILLUSTBATION 

That Neyman’s type A distribution of two parameters and the negative binomial are related, in that 
they may arise as the result of heterogeneity in the population, has been pointed out by W. Feller (1943). 
The similarity of the two may be appreciated by comparing the first three moments: 


Neyman’s type A 
parameters and m 2 

/r 2 TO 1 ?n. 2 (l+TO 2 ) 

/t 3 + 3m 2 + m|) 


Negative binomial 
parameters m x and m 2 

mj»i 2 

mjm^l+ma) 
OT 1 m a (H-3m a + 2mjj) 


It is thus reasonable to suppose that data satisfactorily described by the negative binomial may be 
suitable for our present purpose provided the parameters satisfy the conditions m 2 ^ TO, m 1 < 3'0. Such 
a distribution is given by Ove Lundberg (1940) concerning insurance claims and incapacity caused by 
sickness or accident. 

. The moments of the distribution turn out to be 

\ = 2 - 806 , 871 , A a = 6 - 404 , 549 , 


from which, using the method of moments, we obtain, the estimates 

Wj = 2-187,723, m a = 1-282,653. 

Frequencies for these values are obtained by using the expression due to Beall (Neyman, 19 39), namely, 

( m. »»!„ I 

■ with w 1 m 2 e- mi = 0-778,1466, P 0 = e~ a , where a = m^l 


so that P 0 = 0-205,7G82. 




454 


Miscellanea 

Table 2 


1 

2 

3 

4 

5 

6 

X 

Observed 

frequency 

P. 

by moments 

P . 

by likelihood 

a: 2 

(moments) 

A a 

(likelihood) 

0 

187 


217-29 


197-15 


4-22 

0-62 

i 

186 


160-08 


178-00 


1-50 

0-28 

2 

200 


174-22 


181-27 


3-81 

1-94 

3 

164 


147-79 


153-44 


1-78 

0-73 

4 

107 


114-14 


117-67 


0-45 . 

0-97 

5 

68 


82-63 


83-97 


2-59 

3-04 

6 

49 


56-70 


56-49 


1-05 

0-99 

7 

30 


37-17 


36-18 


0-09 

0-22 

8 

21 


23-44 


22-21 


0-25 

0-07 

0 

12 


14-29 


13-15 


0-37 

0-10 

10 

11 


8-45 


7-64 


0-77 

1-59 

11 

2- 


4-87) 


4-20 


0-44 

1-83 

12 

5 


2-74 


2-28 


__ 

— 

13 

2 

1- 13 

1-61 

- 10-81 

1-21 


__ 

__ 

14 

3 


0-81 

0-63 

8-05 

_ 

-- 

16 

1 , 


0-43 


0-32 


— 

— 

>16 



0-46, 


0-31 


— 

— 


1066 

- 

- 

1 

1 

- 

17-32 

12-28 


log 

10 P 

-985-948,937 

-984-902,666 

■ — 

— 


The fit by moments is given in column 3 of Table 2. For an improved fit we setup a table of values con¬ 
sistingof (a) (x+IIPv+JPq, ( b) («-H) 2 .P 2 +1 /.P 2 , wluch is easily found from (a), (o) (.t+1) (x + 2)P x+ JP x , 
in eaeli of which x takes the values 0, 1,2,..., to 16. In the present case we have 

Xn x {<s+l)P x+ 1 IP x = 2983-669,210, 

Sn„(a: +1) 2 P 2 +1 /P* = 10650-201,104, 

Sn B (a; +1) (* + 2) P^/P* = 12126-916,881. 

Using these in (9) we find Sm 2 = -0-14863, and the improved estimates^ = 2-474,481, m 2 = 1-133,923. 
Since with these values 2n a .(:c+1) Pa+i/P,, = 2966-031,424, it is clear we are much nearer a maximum 
likelihood solution for which 

^n x (x +1} P X+1 IP„ = n\ = 2963. 

The fit with the improved estimates is shown in column 4 of the table. The improvement is probably 
oxnggerated by the value of x 2 , where we have grouped the last five frequencies. A more reliable idea is 
given by log l0 L, shown at the bottom of the table, and it is evident that an improvement has been effected. 


REFERENCES 

Fei.i.isr, W. (1943), On a general class of ‘contagious’ distributions. Ann. Math. Statist. 14, 389. 
Fisher, R. A. (1921). On the mathematical foundations of theoretical statistics. Philos. Trans. A, 
222, 309. 

Fisher, R. A. (1941). The negative binomial distribution. Ann. Eugen., Land., 11,182. 

Ltoidberq, O. (1940), On Random Processes and their Application to Sickness and Accident Statistics. 
Uppsala: Almquist and Wiksells. 

Neymatt, J. (1939), On a new class of ‘contagious’ distributions applicable in entomology and bac¬ 
teriology, Ann. Math. Statist. 10, 36. 










Miscellanea 


455 


Large-sample theory of sequential estimation 


By IT. J. ANSCOMBE 


Haldane (1945) and Finney (1949) have considered a sequential method of sampling a binomial population, 
called inverse sampling, in which sampling terminates when a speoifled number k of individuals have been 
found possessing the attribute. If the proportion 6 of individuals with the attribute in the population is 
small, 9 is estimated with coefficient of variation depending only on the value of k. Thus the coefficient 
of variation of the estimate can be specified in advance. Tweedie (1945) has considered inverse sampling 
more generally. 

Several recent papers in the Annuls of Mathematical Statistics have dealt with the estimation of para¬ 
meters in sequential sampling; in particular, Blackwell & Girshick (1947) have given a lower bound to the 
variance of an unbiased estimate based on a sufficient statistic. A general problem of sequential estimation 
suggests itself, namely, to formulate a rule of sampling such that an unknown population parameter can 
be estimated with specified acouracy and with minimum expected sample size, whatever the true value 
of the parameter. The accuracy of estimation might be specified by width of confidence interval, or by 
variance or coefficient of variation of the estimate, or in any other suoh way. It is the object of this note 
to point out that, while an exact small-sample solution of the problem is in general very difficult, the 
large-sample theory (valid for large expected sample size and high accuracy of estimation) is relatively 
easy. 

Let ns consider the estimation of the mean 9 of a population of which the variance is a known finite 
function v(6) of 9. The observations will be denoted by z x ,z it ..., and the cumulative sum of the first m 
observations by Z m . Sampling continues until a certain inequality is satisfied, of the form Z m ^k(m) or 
Z m < k(m), where k(m) is a function of m determined in advance. The value of m for which the inequality 
is first satisfied will bo denoted by n. If we represent the sampling on a diagram in which m is abscissa 
and the ordinate is denoted generally by y, n is abscissa of the point where the sample path y ~ Z^ 
first meets or crosses the boundary y = k(m). We shall estimate 0 by 


g _ few 

n 


a) 


It will be shown that, under certain conditions, 9 is asymptotically normally distributed with mean 6 
and variance v(6)/n 0 , where n 0 satisfies 

kin,) ~~ ^o> (^) 

i.e. n 0 is abscissa of the point where the mean path y = dm intersects the boundary. 6 will therefore be 
estimated with specified variance o a if the equation of the boundary is 


1 

- v 
m 



(3) 


and with specified coefficient of variation 6 if the boundary is 


y i W 


= 6 s . 


w 


To establish this result (a sketch proof only will be given), we consider a sequence of sequential samplings 
performed on the same population (so that 6 is contant) and defined by a sequence of boundaries y = k[m) 
such that n 0 ->oo. By (2) we have | fc(n 0 ) | -s-oo also, if 6 * 0. Let us suppose that the boundary system 
satisfies the following conditions: 

(i) The boundary is approximately linear in the neighbourhood of n 0 . More exactly, if m is confined 
to an interval (n 0 — c \jn 0 , n„ + c <Jn a ) , where c is constant, then as n 0 -*■ cq 

k(m) = k{n 0 ) + (m—n 0 )k'(n o ) + O{ 1). (6) 

(ii) We can ignore the possibility that the boundary will be crossed elsewhere than in the neighbourhood 
of n 0 , i.e. by choosing c sufficiently large we oan make the chance arbitrarily small that | ft —w 0 j > c 

if n 0 is large. 



456 Miscellanea 

(iii) The boundary crosses tlie mean path y = 6m at a non-zero angle. More exactly, k'(n 0 ) tends to 
a limit not equal to 6 as n 0 -> oo. 

It has been assumed here that the limit of k'(n 0 ) is finite. If it is infinite, condition (i) must be reform¬ 
ulated: if k(m) is confined to an interval (k(n 0 )~c*Jk(n 0 ), Tc(n 0 ) + c,*Jk(n 0 )), where c is constant, then as 
rt„ ->co 

m = n 0 +O[ 1-). (6) 


Conditions (i) and (iii) are in fact satisfied by the boundaries defined by equations (3) and (4). For the 
first of these, 


v{d) 
a 2 

(?) 

kUn) -0+*Q. 

(n °) ~ u'(6i)’ 

(8) 

and if we consider a sequence of boundaries for which a -*■ 0, we have n 0 
infinite) not equal to d if v{6) > 0, and (provided k'(n a ) is finite) k"(n 0 ) = 
when 6 dp 0, 

v(9) 
n ° b 2 8 2 ' 

-»■ oo, k'(n 0 ) is a constant (finite or 
(^(n” 1 ). Similarly for the second, 

(9) 

i.„ , 0v'(0) — v(6) a 

(o) 6v'(0)-2v(0) 6 ’ 

(10) 

and the same conclusions follow. Condition (ii), however, is not always satisfied by these boundaries, 
and they may need modification, as described below. 

Assuming that all three conditions are satisfied, we consider first the case lim/c'(w 0 ) finite, and show 
that if m has" any value in the range | m—n 0 \ < c *Jn 0 , 

prob(w^m) = prob(.£ m >A:(m))-)-o(l) 

(H) 


as n 0 -»■ oo if k'(n a ) < 6, while the inequality on the right-hand side is reversed if k'(n 0 ) > 6. Here Z m is the 
cumulative sum of m observations, taken without regard to whether the boundary has been crossed or 
not. Let us suppose that k'{n 0 ) < Q. Then the left-hand side of (11) exceeds the first member on the right- 
hand side by the chance that a path reaches the boundary for some n<m and then recrosses it to give 

m 

Z n <k(m). This requires that E z t shall be less than (to — n) k'(n 0 ), while its mean is (m~n) 0 and its 

i=»+l 

variance is (m-n)v(O), Now as n 0 ->oo the variance of w is of order n 0> i.e. tends to a finite non-zero 
multiple of n 0 . Hence, applying the central limit theorem, and integrating over values of n <m, we see 
that the chance in question is of order ®(— ra*), where 4> denotes the normal integral, and so is o(l), 
'Similarly if Jc'(n 0 ) > 6. Itmay be noted inpassing that if z t > 0 always and if k'(m) < 0 for all m, the boundary 
once reached cannot be recrossed, and equation (11), without the o(l) on the right-hand side, gives the 
exact distribution of n in terms of the distribution of Z m . 

Now the distribution of Z m is asymptotically normal with mean 6m and variance v{6) m. Let us set 
m = n a +fi, where |/< | Then asymptotically 

prob(Z m &ifc(m)) = prob {Z m ^Mn 0 )+fik'(n a )) = $(-■ (12) 


This takes values other than 0 and lonlywhen/i = O(^/?i 0 ),andsowemayneglectthe/4inthedenommator. 
Thus n is asymptotically normally distributed with mean n 0 and variance n o v(0) [d — k'(n 0 )]~ 2 . But from 


(1) we have asymptotically 


n a + fi n 0 


( 13 ) 


Hence we have the desired result that 6 is asymptotically normally distributed with mean 8 and variance 
u(d)/n 0 . 

If the boundary is wholly vertical, the result is still true, since the sample size n is now fixed. It can 
also be established if lim7c'(« 0 ) = oo for some value of 0, using equation (6). 

We can replace n 0 by E(n) in the variance. The asymptotic variance is then equal to Blackwell & 
Girshick’s lower bound to the variance of an unbiased estimate of 6. We may also replace k(n) by Z n 
in the numerator of (1). 



Miscellanea 


457 

Let us now consider some examples of boundaries. First suppose v(d) is a known constant. Then 
equation (3) represents a vertical boundary, i.e. fixed sample size, with 

m = v/a 2 . ( 14 ) 

Stein & Wald (1947) have demonstrated the optimum character of this boundary. Equation (4) gives 
a parabolic boundary, 

V 2 = (v/b*)m. ( 15 ) 

Tliis boundary, as it stands, passes through the origin. The above theory will apply if we can ignore the 
possibility that the boundary will be reached near the origin instead of near m = n 0 . The boundary should 
therefore be modified for m small so that it does not approach very close to the origin. If 10 I is not too 
large 0 will be estimated with the specified coefficient of variation 6, but it will be estimated with greater 
accuracy than required if | d | is so large that the sample path is likely to meet the modified part of the 
boundary. 

Next, let us consider the Poisson distribution, for which v(0) = 0. Equation (3) gives a parabolic 
boundary, 

y - a 2 m a , (16) 


which again needs modification for m small (so that if 6 is near 0 it will be estimated with lower variance 
than a 2 ). Equation (4) gives a horizontal boundary, 

V = 1/6 2 - (17) 

For the type III distribution of a variance estimate of v degrees of freedom from a normal population 
with variance 6, we have v(0) = (2/r) 0 2 . Equation (3) gives the curve 

y 2 = (18) 

which needs modification for m small. Equation (4) gives a vertical boundary with fixed sample size 

m = 2/{i>6 2 ). (19) 

Finally, let ua consider sampling a binomial population in which a proportion 8 of individuals possess 
a certain attribute. Then v{6) = 8(1 — 8). To obtain a specified coefficient of variation 6 of 8, we use the 
boundary given by equation (4), namely, 

V = b^ + ijm’ (20) 

but modified near m = 0. For 0 small, so that n Q is large, this is practically the horizontal boundary of 
‘inverse sampling ’, which, as already remarked, gives nearly constant coefficient of variation if d is small. 
Suppose now that we wish to estimate 8 with coefficient of variation 6 if 6 is small, and to estimate 1 — 6 
with the same coefficient of variation if 1 — 6 is small. We shall meet this requirement if In {0/(1 - 0)} is 
estimated with constant variance 6 s (assuming, as usual, that the variance is small). Now asymptotically 

var [In{0/(1 —0)}] = [0(l-0)]~ 2 var(0) = [0( 1 - 0) n 0 ]-h (21) 

and this will be equal to b 2 if the boundary satisfies 


or 



x + y = b 2 xy, 


( 22 ) 


where x = m — y. If, therefore, we plot the number of individuals found possessing the attribute as 
ordinate against the number without as abscissa, sampling proceeds until the rectangular hyperbola (22) 
is reached. For 0 near 0 this is nearly the horizontal straight line y = 6~ 2 , and for 0 near 1 it is nearly the 
vertical line x = 6 -2 . 

To sum up, we have considered the estimation of a single unknown parameter. We confine attention 
to statistics which are the sum Z m of the observations z t in some scale of measurement. If we set E(z<) = 0, 
then var (z 4 ) can be expressed as a function v(6) of 0. We find that the variance of Z n jn is, for n large, equal 
to v(8)/n, whether n is fixed in advance or determined by a single-boundary sequential procedure satisfying 
conditions (i)-(iii). If we choose a scale of measurement of the observations such that, in samples of fixed 
size, the unknown parameter is estimated with minimum variance, then the use of the appropriate 
sequential procedure will secure the desired accuracy of estimation with minimum average sample size. 



458 


Miscellanea 


If there is more than one unkn own parameter, and the variance v of the observations z t is'fhiite but not 
a known function of the mean d, v must be estimated as sampling proceeds, and a boundary cannot bs 
specified in advance. Lot us suppose that v can be estimated from a sample of m observations by an 
estimate v that is asymptotically normal with variance as m co. When, say, half the sampling 

has been done that will ultimately be seen to be needed, v satisfies in probability the relation 

v = v(\ + 0(n~ i )), (23) 

A 

and so the eventual sample size n and estimate 0 of 6 satisfy 

A ^ 

V£ir (0) ™ v\n. (24) 

Thus the required value of n can bo predicted at this half-way stage with asymptotic validity. 

I am indebted to Mr D. V. Lindley for helpful criticism. The treatment given above is somewhat 
heuristic in places, notably in the argument used in arriving at equation (11). It seems to mo, how¬ 
ever, that the line of approach may prove to be worth pursuing. 


REFERENCES 

Blackwell, D. <fc Glrshiob:, M. A. (1947). A lower bound for the variance of some unbiased sequential 
estimates. Ann. Math. Statist. 18, 277. 

Finney, D. J. (1949). On a method of estimating frequencies. Biametrika, 36, 233. 

Haldane, J. B. S. (1946). On a method of estimating frequencies. Biometrilca, 33, 222. 

Stein, C. & Wald, A. (1947). Sequential confidence intervals for the mean of a normal distribution with 
known variance. Arm. Math. Statist. 18, 427. 

Tweedie, M. C. K. (1946). Letter in Nature, Land., 155, 463. 


A historical note on the method of least squares 


By R. L. PLACKETT, University of Liverpool 


1. The purposes of this note are: 

(i) to summarize the justifications by Laplace, Gauss and Markoff of the method of least squares; 

(ii) to suggest that Gauss was the first who justified least squares as giving those linear estimates 
which are unbiased of minimum variance; 

(iii) to modernize and extend his proof to cover a general theorem due to Aitken.- 

It is not my object to provoke controversy, and I have attempted to indicate where a personal opinion 
is intended. 

2. The method of least squares has been in use now for over 150 years. During the nineteenth century 
the writings of Todhunter (18 G 5), Merriman (1877) and others gave the impression that Laplace (1812-20, 
collected works 1886) was largely responsible for putting the method on a theoretical basis by means of 
the calculus of probability, whereas the contribution of Gauss (1821, collected works 1873) was mini¬ 
mized or ignored. Lately, the emphasis has changed, and in recent papers and text-books Markoff (1912) 
is credited with justifying the method without superfluous assumptions of normality. For these reasons, 
it seems desirable to disentangle the various justifications proposed and to allot credit in due proportion. 

3. In general let 0(s x 1) be a vector of unknown parameters, x(n x 1) a vector of observations, e(n x 1) 
a vector of errors and A(n xs) a matrix of known quantities; so that 

AB —x = e. 


Further, suppose that W(nxn) is a diagonal matrix whose elements are the reciprocals of the error 
variances. It is required to form an estimate 6* of 0. The method of least squares leads to estimates 
which satisfy A'WLie* = A'Wx. 


Neither Laplace nor Gauss used matrix notation, but their results can immediately be written in that form. 



Miscellanea 459 

4. Laplace (1812-20) discusses the method of least squares in Book 2, Chapter 4, and in the first three 
Supplements. He proves a series of results which are summarized—I hope fairly—in the following: 

Theorem. Among all sxn matrices F leading to estimates of the form FAQ* = Fx, the expected values 
of the elements of | 0* — 0 | are minimized as ns-co when F - /lA'W, fi being an arbitrary multiplier. 

The proof is long but runs on these lines: if u is the error of 6* then PAu = Fe; Laplace proceeds to 
determine the joint characteristic function of F € and deduces that when all errors have the same dis¬ 
tribution, symmetrical about zero, Be lias a multivariate normal distribution as nsoo; whence u also 
has a multivariate normal distribution, and the expected values of the elements of |u | are determined; 
finally, he shows that F = jiA'W implies the vanishing of the differential coefficients of these expected 
values with respect to the elements of F. 

In more detail, Laplace first takes s = 1 and maximizes the probability that his estimate lies between 
given limits; lie then notes that this is the same as minimizing $ | d* — 8 1, and continues to use this 
criterion when s — 2, stating that the result can be extended to greater values of s. In the first Supplement 
he considers the possibility of a bias in the observations and suggests its removal by introducing an 
additional parameter whose coefficient is unity in all equations. 

5. Gauss presented his justification in 1821. The paper is written in Latin, but a French translation 
was published by Bertrand in 1856 and the fundamental theorem incorporated in Bertrand’s own. book 
of 1888. In the early sections of his paper, Gauss also considers the possibility of bias in his observations 
and makes it clear that the preferred estimates are those with minimum variance, although of course he 
does not use this terminology. He begins with errors of differing variance, and by choosing suitable multi¬ 
pliers presents the equations in a form where the errors have the same variance. The proof of the following 
theorem is in Art. 20; it is implicit that he is seeking unbiased estimates: 

Theorem. Among all the systems of coefficients B(sxn) which give Be = 8—0t, the estimate 8f being 
independent of 8, those for which the diagonal elements of BB' are minimized are provided by the method of 
least squares. 

Put 5 = A'e so that % = A'AB—A'x. 

The solution of these equations is 8 = 0* + 1>%, and with DA' = E this becomes 

Bc = 8-8* so (0* —8t) = (B~B)e. 

If this is true for all 0 then (B — B) A — 0, i.e. on post-multiplying by D', (B — E) E' — 0, which implies 
BB' = EE' + (B — E) (B-E)'. It is now clear that the diagonal elements of BB' are minimized when 
B ~ E. 

6. Matrix notation has been adopted for brevity in the preceding sections, but no matrix theorems have 
been assumed. Taking for granted the now familiar properties of matrices regarding associative products 
and inverses, the preceding demonstration can be modernized and shortened. 

If 0* = Bx is unbiased for all 0, then BA = I. With O — A'A it follows that C -1 = BAG -1 , so 

BB' = {C- 1 A')(C- 1 A')' + (B-C- 1 A'){B-C- 1 A'Y, 

i.e. the diagonal elements of BB' are least when B ~ G -1 A', which is the solution provided by least 
squares. 

7. Markoff (1912) devotes Chapter 7 of Ms book to the method of least squares. He states that eaoh 
observation is to be considered as a particular case of many, and as an unbiased estimate of some linear 
function of the unknown parameters. His determination of unbiased estimates of these parameters having 
minimum variance is closely followed in the paper by David & Neyman (1938). 

8. Aitken (1934) has extended the theorem of Gauss by proving that with aknown matrix V of variances 
and covariances of the observations, the minimum of (A6—x)' F _1 (A0 —x) provides estimates 0* such 
that <p* = P 6 * is an unbiased estimate of <p = P8 with minimum variance. Gauss s method can be used 
to prove this also. 

If cp + = Bx then BA = P and 

[B] V[P{A'V- 1 A)~ 1 A'V- 1 Y = \_P(A'V- 1 A)~ 1 A’V- 1 ']V\.P<,A'V- 1 A)- 1 A'V- 1: \', 


consequently 

BVB' = [P(A'F-Ll )- 1 A'V -1 } F[ P(A'V~ 1 A )- 1 AT- 1 -]' 

+ [B-P(A'F- 1 A)- 1 A , F- 1 ] F[B - P(A'F _ 1 A ) _1 A'F -1 ]'. 



460 


Miscellanea 


If-we consider the diagonal elements here, the second term on blic right gives a positive definite quadratic 
form, so minimum variance is attained when 

B = PfiT-'ll-'i'F- 1 , 

the solution given by the method indicated above. 

9. It is therefore my opinion, that Laplace and Gauss proved theorems which are quite different; that 
the justification given by Gauss is preferable; and that Markoff, who refers to Gauss’s work, may perhaps 
have clarified assumptions implicit there but proved nothing new. It is evident that Gauss’s proof is valid 
for all values of n, entirely free from any assumption of normality, and capable of immediate development. 

REFERENCES 

Aim’ken, A. C. (1934). On least squares and linear combination of observations. Proc, Roy. Soc. Edinb . 
A, 55, 43-7. 

Bebteand, J. (1888), Galcul des probability. Paris. 

David, F. N. & Neymast, J. (1938). Extension of the Markoff theorem on least squares. Statist. Res. 
Mem. 3, 105-16. 

Gauss, 0. F. (1855). Mithode des moindres carres (trails. J. Bertrand). Paris. 

Gauss, 0. F. (1873). Theoria combinationis observationum erroribus minimis obnoxiae. Pars prior. 
Werrke, Band 4. Gottingen. 

Laplaoe, P. S., Marquis cle (1886). Thiorie. analytique des probability, 3rd edition, Oeuvres, 7. Paris. 
Maekoit, A. A. (1912). Wahrscheinlichkeitsrechnung (trans. H. Liebmann), 2nd edition. Leipzig 
and Berlin. 

Mebbiman, M. (1877). A list of writings relating to the method of least squares, with historical and 
critical notes. Trans. Conn. Acad. Arts Sci. 4, 151-232. 

Todhuntbb, I. (1865). A History of the Mathematical Theory bf Probability. Macmillan. 


The characteristic function of a weighted sum of non-central squares of 
normal variates subject to s linear restraints 

By G. L BATEMAN 

1. Suppose ajpiCj. x n are independent normal variables with expectations a 1 ,a a ,..., a,„ respectively 

and with unit variance, that is to say, we suppose that 


p[Xj) = 


V(2tt) 


exp [ - i(xj - a,?], (j = 1,2,..., n). 


We consider a weighted non-central sum of squares of the type 

r* = S c,x), 

1=1 

where the cc’a are as defined and the c f {j = 1,2,..., n) are constants. It is assumed that the ay are subject 
to a linear restraints 

n 

S by Xj = p, (1= l,2,.,.,s), 

b u and p l (l = 1.(;j= 1,2, ...,n) being given constants. The characteristic function of the joint 

distribution of rjr' 1 , p lt p 2 . p s may be written down immediately. We have 

<f>(t,t u ...,t M ) = II^J -^^expj — afp+itCjOilA- E 

The evaluation of the integral is straightforward and we obtain 


$(tt t l9 ..i f ) = II (1 — 2 itcj)^ exp (ii 

3=1 


i 4 


\ j —1 ^ '2l m=> 1 




i 2 J J, 
1=1 / 


(1) 


A,- — 2 






_ , and B, = 2 

3=ll — 2itCj 1 — 2itCj 


where 


( 2 ) 



Miscellanea 


461 


2. Bartlett (1938) considered the case of two variables u and u, whose joint characteristic function is 
0, h), and he showed that the characteristic function of u when u x is fixed, which we denote by Alt I u,) 
is given by 

tfadt, 

J —00 


<ji(t | Ml ) = 


I 


+ CO 


e'~ i ' IUl 0(O, t]) dt 1 


This result may bo cosily, and obviously, extended to the case of s + 1 variables, u, %, u lt Write 

p(u,u±, . u,) for the joint probability distribution of u, u±, ...,u, and (fat,t 3 , t, 2i ...,t s ) for its characteristic 
function. It follows that 



f+CO 

• 4-00 

s 1 

o = 

... 

exp 

ikt+i £ t,u. 

• 

— 00 J 

— 00 

L i=i _j 


r+m 

* +00 

a "j 

:= 

... 

exp 

itu+i £ 4% 

* 

— CO t 

— CO 

L 1=1 J 


Writing 
we have 


u lt u. j, ■■■,u a )p(u 1 ,u 2 , ...,u l )dudu 1 ...du,. 

f +» 

4>(t\u v u v ...,u,)= - e. it “p{u\u 1 ,u i , ...,u,)du, 

J —CD 

f+® r+« • r s ~i 

.. >h) = ••• 1 ...,M,)exp I f £ £;«( I du x ...du„ 

J -CO J —CO L 1=1 J 

whence, using the Fourier transform, 

l |*+a> |*+® T s 1 

<j>{t\u L ,u i ,...,u,)p(u v u 2 ,...,u,) -J^yj _ J _ I 0(Mj.. t.) dt x .. .dt,. 




It follows that 



r::-j 

r+» 

exp 

— 00 

r s - 

\ -£ 2 t x u t 

L z=i 

<j)(t ) tyy t Zi .. 

., tf a ) 

J 

|*+00 

-00 

r +<c 

exp 

— 00 

“ 5 

-i S iju, 
U 1=1 _ 

0(0, <r,ig, • 

)dt 1 ...dt, 


(3) 


3. We proceed to apply this result to find the characteristic function of of § 1. Substituting in the 

right-hand side of (3) the joint characteristic function of 0-' 2 , p v p 2 . . as given by (1), we obtain, after 

some reduction, that the characteristic function of xfr'*, given p lt p v is 

. Pi ) 


= II (1 — 2itc j )~ 
1=1 


j::-j 

^ +00 JSS 

exp i 2 (Bi—pijtj,—- E E Ai m t x t m 
— co L Z=1 

dt x . 

..dt. 

r +» 

p + co r 8 J 8 8 



... 

exp i 2 [fa-ptih—z S £ ai m t,t n 

dt 3 . 

..dt, 

J — CO J 

'—oo L 1=1 ”l=lm=l 




where 


n n 

a lm = £ b u b mj and fa = £ a,b u , 
j=l 1=1 


(4) 

( 5 ) 


and A lm and B t arc as given in (2). The multiple integral in the numerator of (4) can bo evaluated and is 

(27r)i* 


M 


I wp{-¥>B-p)'A 1 (Z5-p)], 


where A is the matrix {.d im }, l = 1,2,..., s, m = 1, 2,..., s, A - 1 the inverse matrix of A, \ A | the determinant 
associated with A, (B — p) the column vector {B t —pi}, 1= 1,2, ...,s, and (B—p)' the corresponding row 
vector. The denominator of (4) may he evaluated in a similar way and we have finally 

= fi (l-2ifc,)-*exp [it £ + 

3 ( 6 ) 

where a is the matrix {<x im }> l = 1,2,..., a, and/9 the veotor {fa}, l = 1,2,a. The probabili ty law of if' 2, 
for given restraints can be evaluated in certain cases. 



462 


Miscellanea 


4. The distribution of the unweighted sum of non-central squares of normal deviates subject to a 
orthogonal linear restraints can be obtained as a special case of the foregoing. It has already been derived 
by Patnaik (1949) using a different method. We put % = 1 for all j, when 

f ,3 = £ x% 

3=1 

where the x's are as previously defined and 

S(Xj) = Oj (i = 1,2.?i). 


If the restrictions are orthogonal, i.e, if 


£ — 4m - 1 if i = m > 

= 0 if l^m, 

then from (2) and (5) we see that 

A lm = (l-M)~'a lvl = (l-2U)-iS lm 
and J5 i = (l-2*)- 1 4 

Substituting in (4), it follows that 


* $ . (* I ft. • ■• ■> P.) = ( 1 - 2i0 _i(n_s) ®xp 

M J 

= (l-2i£)“K n ~ s) exp 


it n 1 5 1 5 


' ^ I 7 V 

x=*+*£x* 


(V) 


» s j n \ s 

where A = £ <zf- £ I S ffljhy) . (8) 

3-1 Z=1\i=l / 

We recall that the characteristic function of the non-central ^ 8 , referred to as is 

0 x -i(i) = (1 -2ii)~t"exp 

Hence, using the uniqueness property of the characteristic function and the fact that there is a (1,1) 
correspondence between the characteristic function of a variable and its probability law, it follows that 
when the sb/s are subject to s orthogonal linear restraints, 


A it 

_l-2 it_ ' 


n s 
£ xj- £ p\ 

J =1 1=1 

is distributed as ,y' z with (n-s) degrees of freedom and with parameter A as given by (8). 


REFERENCES 

Babtlett, M. S. (1938). J. Lord. Math. Soc, 13, 62. 
Patnaik, P, B. (1949). Biometrika, 36, 202. 



Miscellanea 


463 


Intra- class rank correlation 

By J. W. WHITFIELD 
Psychological Laboratory, University of Cambridge 

Given a number of pairs of items each of whioh is measured according to some quality, the normal 
procedure to discover whether the arrangement in pairs bears any relation to the quality measured is to 
calculate the intra-class correlation coefficient. As an example we may consider pairs of brothers and 
a measure of their stature. If we wish to know whether there is any correlation between older brothers 
and younger brothers on stature, we calculate the product-moment correlation in the ordinary way. But 
if the more general question whether brothers tend to have similar stature is asked, there are no grounds 
upon which we can differentiate between each member of a pair so as to separate them into the two arrays 
for correlation. The mean correlation for all possible arrangements is required. This is the intra-class 
correlation which can be calculated without performing the numerous individual product-moment 
correlations. With quantitative data the procedure is not limited to paired data, but may be performed 
with arrangements of varying sized groups. 

The same type of problem can arise with racked data. Consider eight pairs of brothers, ranked upon 
some quality which is not amenable to quantitative measure: 


Rank values 


First pair: 

I 

and 

6 

Second pair: 

5 

and 

9 

Third pair: 

7 

and 

8 

Fourth pair: 

11 

and 

13 

Fifth pair: 

2 

and 

4 

Sixth pair: 

15 

and 

16 

Seventh pair: 

10 

and 

14 

Eighth pair: 

2 

and 

12 


There are 256 possible arrangements for correlation, although only 128 need actually be calculated. 
Those arc distributed (using Kendall’s r coefficient of rank correlation—Kendall, 1948 ) as follows: 


' 

Frequency 

S value 

8 

+ 18 

32 

+ 16 

< 40 

+ 14 

16* 

+ 12 

8 

+ 10 

16 

+ 8 

8 

+ 6 

128 

+ 1684 


Hence the mean S is +13, and the mean r is + 04643. 

The computation of S for all possible arrangements would be excessively tedious for all data except 
those involving very few observations. The following alternative computation, though restricted to the 
case of paired data only, is much simpler. It can probably be extended to include family groupings other 
than two. 

Rearrange the data in the example in order of the higher rank in each pair—i.e. first the pair with the 
highest ranked individual in it, next the pair with the highest ranked individual from the remainder, 

etc., giving (1>6) (2> 4 ) (3,12) (5,9) (7,8) (10,14) (11,13) (15,16). 


Biometrika 36 


30 




466 


Miscellanea 


Probability that a given value of will be attained or exceeded by chance 
(single tail only, distribution symmetrical about zero) 


S, 

n = 6 

n — 8 

n= 10 

n— 12 

n^U 

n= 16 

n= 18 

n= 20 


o' 

0-50000 

0-50000 

0-50000 

0-60000 

0-50000 

0-50000 

0-60000 

0-50000 

0 

2 

0-40000 

0-42857 

0-44868 

0-46080 

0-46875 

0-47432 

0-47842 

0-48153 

2 

4 

0-20000 

0-29524 

0-34921 

0-38374 

0-40693 

0-42336 

0-43549 

0-44473 

4 

6 

0-06667 

0-18095 

0-26820 

0-31063 

0-34717 

0-37350 

0-39320 

0-40838 

6 

8 

— 

0-09524 

0-17989 

0-24367 

0-29069 

0-32564 

0-36217 

0-37276 

8 

10 

_ 

0-03810 

0-11640 

0-18461 

0-23855 

0-28025 

0-31204 

0-33813 

10 

12 

— 

0-00952 

0-06878 

0-13499 

0-19156 

0-23794 

0-27502 

0-30475 

12 

14 

— 

— 

0-03598 

0-09370 

0-15023 

0-19913 

0-23964 

0-27283 

14 

16 

— 

— 

0-01687 

0-06195 

0-11483 

0-16412 

0-20673 

0-24257 

16 

18 

— 

— 

0-00529 

0-03848 

0-08532 

0-13309 

0-17649 

0-21412 

18 

20 

_ 

_ 

0-00106 

0-02213 

0-06143 

0-10606 

0-14903 

0-18700 

20 

22 

— 

— 

— 

0-01164 

0-04268 

0-08296 

0-12440 

0-16309 

22 

24 

-, 

— 

1 - 

0-00629 

0-02843 

0-06359 

0-10258 

0-14065 

24 

26 

— 

— 

• - 

0-00202 

0-01814 

0-04769 

0-08352 

0-12028 

26 

28 

— 

— 

— 

0-00068 

0-01093 

0-03492 

0-06708 

0-10196 

28 

30 

— 

— 

— 

0-00010 

0-00616 

0-02490 

0-05310 

0-08566 

30 

32 

— 

— 

— 

— 

0-00320 

0-01726 

0-04140 

0-07127 

32 

34 

— 

— 

— 

— 

0-00150 

0-01156 

0-03175 

0-05871 

34 

36 

— 

— 

— 

— 

0-00061 

0-00747 

0-02392 

0-04786 

36 

38 

— 

— 

— 

— 

0-00021 

0-00462 

0-01768 

0-03859 

38 

40 

— 

— 

— 

_ 

0-00006 

0-00272 

0-01280 

0-03076 

40 

42 

— 

— 

— 

— 

0-00001 

0-00151 

0-00906 

0-02421 

42 

44 

— 

— 

— 

— 

— 

0-00078 

0-00626 

0-01882 

44 

46 

— 

— 

— 

— 

— 

0-00037 

0-00420 

0-01442 

46 

48 

— 

— 

— 

— 

— 

0-00016 

0-00274 

0-01089 

48 

50 

— 

— 

— 

_ 

_ 

0-00006 

0-00172 

0-00810 

60 

52 

*— 

— 

— 

— 

-- 

0-00002 

0-00104 

0-00592 

52 

J54 

— 

— 

— 

— 

— 

0-0 6 4 

0-00000 

0-00425 

54 

56 

— 

— 

— 

— 

‘ - 

0-0° 6 

0-00033 

0-00299 

66 

58 

— 

— 

— 

— 

— 

— 

0-00017 

0-00206 

68 

60 

— 

— 

— 

— 

_ 

_ 

0-00008 

0-00138 

60 

62 

— 

— 

— 

— 

-- 

— 

0-00004 

0-00091 

62 

64 

— 

— 

— 

— 

— 

— 

0-00001 

0-00058 

64 

66 

— 

— 

— 

— 

— 

— 

0-0 E 5 

0-00036 

66 

68 

-- 

- - 

— 

— 

— 

— 

0-0 6 l 

0-00021 

68 

70 

— 

— 

— 

— 

— 

_ 

0-0»3 

0-00012 

70 

72 

— 

— 

— 

— 

— 

— 

0-0 7 3 

0-00007 

72 

74 

— 

— 

— 

— 

— 

— 

— 

0-00003 

74 

76 

— 

— 

— 

— 

— 

— 

— 

0-00002 

76 

78 

— 

-- 

— 

- - 

— 

— 

— 

0-00001 

78 

80 

— 

— 

— 

— 

-. 

-- 

__ 

0-0 6 3 

80 

82 

-- 

— 

— 

— 

— 

— 

— 

0-0 6 1 

82 

84 

— 

— 

— 

— 

— 

— 

— 

0-0° 3 

84 

86 

— 

— 

— 

— 

— 

— 

— 

0-0 7 8 

86 

88 


’ 

- - 

-- 

— 

— 

— 

0-0 7 2 

88 

90 

— 

— 

— 

— 

— 

— 

— 

0-0 B 2 

90 

fit 

2-17 

2-42 

2-56 

2-63 

2-68 

2-72 

2-76 

2-80 

fit 


Miscellanea 


467 


Prom this ws got 


and hones 


_ n 5 n 6 2« 4 2n a 4n s , 8w 
^ “ 108 ~ 75 27' + ”i6 + 27 + 220’ 

■ R q 4,32 


Thus /a will approach the value 3 as n increases, and hence for reasonable values of n the normal deviate 
tost can be employed. 

Uses ojt method 

It seems to the writer that there arc several situations in experimental psychology where this method 
will be of use, in addition to the type of situation used in the example. One such possibility concerns the 
problem of relating preference judgements to particular qualities of the objects judged. As an example, 
in preference judgements about different types of chair, height of seat from the ground may be an im¬ 
portant factor, but the optimum may vary from individual to individual. If various types of chairs wore 
chosen so that they could be considered as a number of pairs with regard to this quality (akind of ranking 
replication), the preference judgements could be treated in this fashion independently for each judge, 
to discover whether or not seat height is an important criterion in the preferences. 


REFERENCE 

Ivendall, M. G. (1948). Ranh Oorrdation Methods. London; Griffin and Co. 


A. note on non-normal correlation 

By J. B. S. HALDANE 

The product-moment correlation p is frequently estimated for two variates which are not normally 
listributed. There are, however, no general expressions for the offset of this non-normal distribution on 
(he precision of the estimate of p. They may he obtained in one special case which is of biological import- 
once. Suppose X and Y are two correlated variates. Then if 

X = a+^[(l+p)lx+(l~p)iy], • Y= b + ~[(l+p)ix-(l-p)ly], 

and further if x —y s= 0, » 2 = ’y’ 1 = 1, and a; and ^ are independent, the variance ofX iscr 2 , that of TisT 2 , 
and their covariance is perr, regardless of the distributions of x and y. Hence the correlation of X and Y 
is p. If x and y are normally distributed the correlation is of course normal. Now in biological statistics 
X and Y may be measurements of two organs in the same individual, or of their logarithms, x depends 
on the sum of causes which affect X and Y alike, y on the sum of causes whioh affeot them oppositely. 
Eor example, in any aeries of specimens, not all of which axe fully grown, x will increase with age up to 
a certain point; and in a population containing a minority of juvenile members the distribution of x will 
probably be negatively skew. But y may be quite independent of age if the variability of the organs 
measured is uncorrelatod with age, and may well be normally distributed when a; is not. 

Let k„ bo the oumulants of the joint distribution of * and y. Then since they are independent, K n = 0 
unless r or s = 0, x l0 = x ol = 0, x a0 = /c oa = 1; and iet x 80 = y t , x 08 = yj, k 40 = y 2 , k m = y 2 , etc., these 
being measures of the deviations from normality of the distributions of x and y. 

Out estimate of p on a sample of n members is thus ' 

nYX r Y r — £X r 2 Y t 
r ~ [n2X» - (2X,.) 2 ]* [nS Y* - (S E r ) 3 ]» 

= [ra(l+p) S* 2 ~n(l-p)2i/?-(l+p) (2®,) 2 + (X-p) (Sy T ) s J 

X [«(1+p) Za:?+w(I — p) Yyi + 2n{ 1 — p 2 ) 1 Yx r y r - (1+P) (S® r ) a —(1 — p) (Sj/ r ) a — 2(1—p 2 ) 1 Eos, Eg/,]” 1 
x [«(1 + p) Yxj + n( 1 -p) 2 y\ - 2n( 1 - p 2 )‘ Yx r y r - (I +p) (Sa: r ) a - (1 +p) (2j/,) 2 + 2( 1 -p 2 ) 1 2a;,Si/,]-*• 

„ „ [(1 +p) {n2a;? — (2ay) 2 } — (1 —p) {riYyl — (Sj/v) 2 }] 2 _ 

S ° ~ [(1 +p) {nSa 2 - (S« r ) 2 } + (1 -p){nSi/?-(2p,) 2 }] 2 -4(1 -p s ) (nZx t y r ~ 2a- r Ih/ f ) 2 

[ (1 ~bp) ^ao~ (1 ~P) ^oal 2 (1) 

~ l{l+p)k m + (1 -p)/Coa3 a -4(1-P 2 ) ’ 



Miscellanea 


468 

where k„ is the unbiased estimate of k„ from the momenta of the variates in the sample. For example, 

_ nSx 2 r -(Sx r j a oan ^ how the mean va i ue 0 f r z w in he affected by deviations of the dis- 
20 n(n—1) _ _ 

tributions of v and y from normality, y exceeds (fc a0 ) 2 , or unity, by the sampling variance of k M which i s 

2 *2° 4- —, or 2 +—. The effect of non-normality in the distribution of % is therefore to increase the 
n~ 1 n ’ n~l n _ _ 

mean value of by yjn. Similarly, k a \ is increased by y'/n. k io k oi is not increased, since % and y are 
independent, kft does not include terms with zero suffixes, so it is also unaltered. In fact, both numerator 
and denominator of ( 1 ) are increased by »~ 1 [( 1 +p) a y 2 + ! 1 “ P)“ Yil ■ 

We cannot calculate the variance of r directly from (1) since f differs from p by a quantity of order n'h 
But since both the numerator and denominator are increased by 

(1+P) 2 - + (1-P) 2 - or »-i[(l + p) 2 y a + (l-p) 2 y'] 
n ti 

above the values found when the distributions of a and y are normal, we have in the normal case 

s _ ip 2 + 4w -1 P 

i 0 4 + 4 n~ x Q ’ 

where P and Q are independent of n to order n~ s , and in general 

4p a +w- 1 [4P+(I+p) 2 y a +(l-p) a y;] 
r ' 4+n- 1 [4<2 + (l+p) 2 y il +(l~p) 2 y'] * 

So ^ ^ [(1 + P) 2 7*+(1 - P) a ?;] + 0(n- 2 ). 

The variance of r is therefore increased by this quantity. The precision of tho estimate of p does not 
therefore depend on the skewness of the distributions of x and y, provided they are mesokurtio. And Binco 

y^Mi+prHyiW+ym 7j = v*(i"P)-»[yi(i)-7 1 (r)], 
it follows that skew distribution of X and Y will not affect the precision of r. On tlio other hand, the 
distributions of X and Y have the same value of y a or - 3, namely, 

■ r„ = t[( 1 +p) 2 7l + (1 -p) 2 y,]. 

Hence var(r) = -—— (1 -p 2 + P a ) + 0(n~ 2 ). (2) 

71 

/14 -r\ 

If we employ Fisher’s transformation z = Jlog I -—I, we find 

var(z) = «-! ^1+ +0(n- 2 ), (3) 

The variance of z is thus no longer almost independent of p. Bui; the precision of r is iuomiHUil if tho dis¬ 
tributions of X and Y are platykurtic, and decreased if they aro leptokurtie. Clearly, however, (2) and 
(3) are inapplicable when | p | is near unity, terms of at least order n~ 2 being required. 

On empirical grounds, E. S. Pearson (1931, 1932] stated that ‘tho normal bivariate surface may be 
distorted and mutilated to a remarkable degree without affecting tho frequency distribution of r’. This 
would seem to be true when j p | is not near unity. It js also true for ‘mutilations’ which affect skewness 
without doing a great deal to kurtosis, However, when correlation is high it would seem that a relatively 
slight change in kurtosis may have a large effect on tho variance of r. 

It is possible that the formula (1) might serve as a basis for a new development of the theory of the 
distribution of r in the normal case, and further information could certainly be obtained from it con¬ 
cerning tho more general case here considered. In the most general case % and y, though thoy have a zero 
coefficient of correlation, are not independent, so such munulants os would not in general he zero, 
and it is doubtful whether the method would be of value. On the other hand, if the distributions of X 
and Y, though having different values of have insignificantly different values of /?„, equations (2) 
or (3) may be used with some confidence. 

I have to thank Mr K. A. Kermack for useful criticism. 

REFERENCE 

Pbaiison, E. S, (1931, 1932). The test of significance for the correlation coefficient. J, Airier, (flutist. 
Sod. 26, 128; 28, 424. 



[ 469 ] 


REVIEWS 


Probability Theory for Statistical Methods. By F. N, DAVID ix + MO rm 
Univorsity Proas. 1940. Price 16a. ^ 


Cambridge 


This book deals with the simpler parts of probability theory, in so far as they are relevant to common 
statistical techniques. It is intended primarily for students who have reached intermediate standard in 
mathematics, but it will bo found useful by all who are interested in statistical methods, as it gathers 
together a number of useful results which otherwise are very much scattered throughout the literature 
Tho book begins with a discussion of the definition of probability; after considering various viewpoints’, 
the author decides on a definition which is adequate for the purposes of the book (except that strictly 
speaking in tho form given it applios only to rational probabilities, whereas when the normal distribution 
is being considered we shall need irrational ones too). After this the author goes on to consider the binomial 
distribution. Pliroo ways of summing the tails of a binomial are discussed: the exact method, using the 
incomplote beta function; a not very well known but very convenient continued fraction due to Markoff 
and Muller; and also tho usual normal approximation. 

Various generalizations of the binomial distribution are then discussed: Poisson’s limit, the negative 
binomial, Noyman’s contagious distribution, and, in later chapters, the multinomial distribution, and 
Poisson’s and Loxis s forms of tho binomial for heterogeneous data. Methods are given for fitting most of 
these distributions to observed data. 

A chapter is devoted to Bayes’s Theorem, pointing out tho conditions under which it may be validly 
applied, and also to confidence intervals, and their uses. There follows a chapter on simple genetical 
applications of tho theory, largely replacing the usual sequence of problems on drawing black and white 
balls from urns, throwing dice, oto. This seems a vary welcome development, provided that it is accom¬ 
panied by a sympathetic understanding of at least tho elements of genetics, and genes are not going to be 
regarded in future simply as objocts conveniently provided by nature for the construction of new types of 
examination questions. Incidentally, Dr David is over-cautious in her discussion of random mating, 
saying that it is ‘ rarely mot, with in practice ’, whereas in fact with genes such os the blood-group genes 
it can bo shown to hold with considorablo accuracy. 

Throo chapters deal with random variables, their distributions, 1 expected’ values, and moments, and 
with tho law of largo numbers, Tho simpler moments of the sample mean and sample variance are worked 
out in somo detail, both for infinite and (in the case of the moan) for finite populations. 

Two chapters on estimation follow. Here tho estimates are practically restricted to linear unbiased 
ostimates; and tho Generalized Markoff Theorem provides a method of obtaining these estimates— 
a proof of the theorem is given for Oho case of two parameters. One important application of this theorem 
is to linear regression, and another to the problem of efficient sampling of human and other populations. 
Here tho methods of stratified and restricted stratified sampling are explained and disoussed in some detail. 

Tho finul three chapters are concerned with characteristic functions and their elementary properties. 
Here wo aro concerned with finding tho characteristic function of a known distribution, and with the 
converse, with tho connexion between characteristic functions, moments, and cumulants, ancl with the 
central limit theorem. 

Tho stylo of the book is simple, straightforward, and lucid—refreshingly so; and there are few errors, 
mostly trivial muss, The only two places whore the exposition is not quite up to the standard of the rest 
of the book Room to bo tho ‘ problem, of points ’, p. 26, where the condition r < t might be brought in earlier, 
and tho calculation on pp. 00-7, where the sentence ‘we may assume therefore that A x = Rr’ gives 
rise to what seem at first sight rather doubtful calculations of the probabilities of various genotypes 
for A a ; if, however, we say instead ‘let be the parent who passed on the r gene to B a ’ then these 
calculations aro completely justified. Incidentally the phrase ‘for example, colour-blindnass ’ whioh occurs 
in the examination question quoted here seems to be singularly unfortunate, as colour-blindness is not 
inherited in this manner, nor has it a gene frequency as low as O’OOl. Other omissions which may cause 
some difficulty occur on pp. 16-16, where the product law is used before it has been proved; in chapter xv, 
where the fact that two different distributions cannot have the same characteristic function is frequently 
used hut nowhere stated; and on p. 214, where the independence of the grouping error is tacitly assumed. 
It is a pity that means and momonts are not defined on p. 31, and their simpler relations proved, since 
apart from this omission the book is self-contained. Small errors occur onp. 66,line 6, where the numerator 

should bo | Jfc-np | - J; p. 59,lino8, whore the inequality shouldread... < j 1 ~ 5 P- 63,secondformula, 



470 


Reviews 


read (2 x + t) for (2 n+t) in square brackets; p. 96, line 9, for ‘their offspring 1 read ‘a single one of their 
offspring 1 (this was expressed badly in the original question); p. 107, line 9, for 106 read 10 5 ; p. 126, 
line 10, for (n/n-read [n/(n-1)]*; p. 128, line 4 from foot, for ‘central difference 1 read ‘linear differ¬ 
ence 1 ; p. 147, Markoff’s Lemma should read 1 ... ^ 1 /t v , and similarly onp. 148, the BienaimATchebycheff 
inequality ‘^ 1 — l/2 a ’;p. 165, section (iii), ‘thenequationsfor^^) 1 . Finally, onp. 205, the definition 
of boundedness does not seem to be completely happily phrased. 

There are also a few points on which the reviewer would like to express a personal opinion, although 
perhaps not everyone would agree. The use of {q+p) n instead of (say) ( q+tp) n as a generating function 
(as on p. 30) does not seem very happy, since obviously [q+p) n = 1. The consistent use of ‘standard 
error 1 for ‘standard deviation 1 seems also a little odd. The usual practice for geneticists now is to con¬ 
centrate attention on gens frequencies, rather than the genotype frequencies considered in chapter vin; 
although this would probably simplify the calculations, the point is not very important in the context. 
The French spelling ‘Tchebycheff 1 , though in common use, is irritating, but can perhaps be blamed on 
the man himself, although he was really ‘ Chebysh6fAnd finally, the notation r(.u -f 1) for *!, often used 
when x is not integral, seems merely designed to make the formulae unnecessarily complicated. 

Summing up, we may say that the book would be a most useful addition to the library of any 
statistician or student of statistics. 

OEDEIO A. B. SMITH 


The Fundamentals of Statistics. By T. L. Kelley. 756 pp. Harvard University Press, and 
Oxford University Press. 1947. Price 55s. 

This book is an introductory text to statistical theory, ThB author aims at setting out the logical processes 
which are involved, in the application of any piece of statistical technique and generally succeeds with 
admirable clarity. Many who arc interested in what statistical methods are about but do not care for 
mathematics may read this text with profit, for the mathematical theory is kept to a minimum in the 
. first 200 pages. It is perhaps open to question whether Prof. Kelley achieves the clarity with his mathe¬ 
matical analysis that he does with his exposition of general principles. Tho reviewer found his notation 
cumbersome in places, and it is likely that a student coming fresh to the subject will very often find 
himself in difficulties. For example, the section on sequential analysis gives the outline only of the 
mathematical analysis. Reference is given to other publications whereby the student may supplement 
this analysis, but for the elementary student these other publications will prove too difficult. 

On the whole it would be fair to say that this book is a useful addition to atatistical text-books, although 
it will never take the place of such old and tried classics as Yule and Kendall’s Elementary Theory of 
Statistics, for the student beginning the study of statistical methods. Biologists, medicals, etc., will 
probably find it useful to read before attempting R. A. Fisher’s Statistical Methods for Research Workers . 


7 . N. DAVID 




ANNALS OF EUGENI* 

A JOURNAL OF HUMAN GENETICS 

Editei byL. S. FENROSE with, the aa^istanci 
JULIA BELL J. B. S. HALDANE H. R. R< 

a.A.-.RiSBEB MART IT. BARIT ' A. E, 































